pyexcel - Let you focus on data, instead of file formats

Author:C.W.
Source code:http://github.com/pyexcel/pyexcel
Issues:http://github.com/pyexcel/pyexcel/issues
License:New BSD License
Version:0.2.5
Generated:September 01, 2016

Introduction

pyexcel provides one application programming interface to read, manipulate and write data in different excel formats. This library makes information processing involving excel files an enjoyable task. The data in excel files can be turned into array or dict with least code, vice versa. And ready-made custom filters and formatters can be applied. This library focuses on data processing using excel files as storage media hence fonts, colors and charts were not and will not be considered.

Excel files are de-facto file format for information sharing in non-software centric organisations. Excel files are not only used for mathematical computation in financial institutions but also used for many other purposes in an office work environment. This is largely casued by wide adoption of Microsoft Office. Comparing the existing, mathematics savvy Pandas library, this library intends to help data processing job where data extraction is more important than data analysis. In such context, ease of use, and low overhead is preferred, while Pandas is as big as 4MB and contains hundreds of potentially useful functions.

Note

Since version 0.2.2, no longer a plugin should be explicitly imported. They are imported if they are installed. Please use pip to manage the plugins.

Getting the source

Source code is hosted in github. You can get it using git client:

$ git clone http://github.com/pyexcel/pyexcel.git

Installation

You can install it via pip:

$ pip install pyexcel

or clone it and install it:

$ git clone http://github.com/pyexcel/pyexcel.git
$ cd pyexcel
$ python setup.py install

For individual excel file formats, please install them as you wish:

a map of plugins and supported excel file formats
Plugin Supported file formats Dependencies Python versions Comments
pyexcel csv, csvz [1], tsv, tsvz [2] pyexcel-io 2.6, 2.7, 3.3, 3.4, 3.5, pypy  
xls xls, xlsx(read only), xlsm(read only) xlrd, xlwt 2.6, 2.7, 3.3, 3.4, 3.5, pypy supports reading xlsx as well
xlsx xlsx openpyxl 2.6, 2.7, 3.3, 3.4, 3.5, pypy  
ods3 ods ezodf, lxml 2.6, 2.7, 3.3, 3.4, 3.5,  
ods ods (python 2.6, 2.7) odfpy 2.6, 2.7  
text json, rst, mediawiki, latex, grid, etc. tabulate 2.6, 2.7, 3.3, 3.4, 3.5, pypy writing to files only
Plugin compatibility table
pyexcel pyexcel-io xls xlsx ods ods3 text
0.2.2+ 0.2.0 0.2.0 0.2.0 0.2.0 0.2.0 0.2.1+
0.2.1 0.1.0 0.1.0 0.1.0 0.1.0+ 0.1.0+ 0.2.0
0.2.0 0.1.0 0.1.0 0.1.0 0.1.0+ 0.1.0+ 0.1.0+

Usage

Suppose you want to process the following excel data :

Name Age
Adam 28
Beatrice 29
Ceri 30
Dean 26

Here are the example usages:

>>> import pyexcel as pe
>>> records = pe.get_records(file_name="your_file.xls")
>>> for record in records:
...     print("%s is aged at %d" % (record['Name'], record['Age']))
Adam is aged at 28
Beatrice is aged at 29
Ceri is aged at 30
Dean is aged at 26

Design

Introduction

This section introduces Excel data models, its representing data structures and provides an overview of formatting, transformation, manipulation supported by pyexcel

Data models and data structures

When dealing with excel files, there are three primary objects: cell, sheet and book. A book contains one or more sheets and a sheet is consisted of a sheet name and a two dimensional array of cells. Although a sheet can contain charts and a cell can have formular, styling properties, this library ignores them and pay attention to the data in the cell and its data type. So, in the context of this library, the definition of those three concepts are:

concept definition pyexcel data model
a cell is a data unit a Python data type
a sheet is a named two dimensional array of data units Sheet
a book is a dictionary of two dimensional array of data units. Book

Data source

The most popular data source is an excel file. Libre Offcie/Microsoft Excel could easily generate an new excel file of desired format. Besides a physical file, this library recognizes additional three additional sources:

  1. Excel files in computer memory. For example when a file was uploaded to a Python server for information processing, if it is relatively small, it will be stored in memory.
  2. Database tables. For example, a client would like to have a snapshot of some database table in an excel file and ask it to be sent to him.
  3. Python structures. For example, a developer may have scrapped a site and hence stored data in Python array or dictionary. He may want to save those information as a file.

Data format

This library and its plugins support most of the frequently used excel file formats.

file format defintion Single Sheet
csv comma separated values Yes
tsv tab separated values Yes
csvz a zip file that contains one or many csv files  
tsvz a zip file that contains one or many tsv files  
xls a spreadsheet file format created by MS-Excel 97-2003 [1]  
xlsx MS-Excel Extensions to the Office Open XML SpreadsheetML File Format. [2]  
xlsm an MS-Excel Macro-Enabled Workbook file  
ods open document spreadsheet  
json java script object notation  

See also a map of plugins and supported excel file formats.

Data transformation

Quite often, a developer would like to have the excel data in a Python data structures. This library supports the conversions from previous three data source to the following list of data strcutures, and vice versa.

A list of supported data structures
Psudo name Python name Related model
two dimensional array a list of lists Sheet
a dictionary of one dimensional arrays a dictionary of lists Sheet
a list of dictionaries a list of dictionaries Sheet
a dictionary of two dimensional arrays a dictionary of lists of lists Book

Examples:

>>> two_dimensional_list = [
...    [1, 2, 3, 4],
...    [5, 6, 7, 8],
...    [9, 10, 11, 12],
... ]
>>> a_dictionary_of_one_dimensional_arrays = {
...     "Column 1": [1, 2, 3, 4],
...     "Column 2": [5, 6, 7, 8],
...     "Column 3": [9, 10, 11, 12],
... }
>>> a_list_of_dictionaries = [
...     {
...         "Name": 'Adam',
...         "Age": 28
...     },
...     {
...         "Name": 'Beatrice',
...         "Age": 29
...     },
...     {
...         "Name": 'Ceri',
...         "Age": 30
...     },
...     {
...         "Name": 'Dean',
...         "Age": 26
...     }
... ]
>>> a_dictionary_of_two_dimensional_arrays = {
...      'Sheet 1':
...          [
...              [1.0, 2.0, 3.0],
...              [4.0, 5.0, 6.0],
...              [7.0, 8.0, 9.0]
...          ],
...      'Sheet 2':
...          [
...              ['X', 'Y', 'Z'],
...              [1.0, 2.0, 3.0],
...              [4.0, 5.0, 6.0]
...          ],
...      'Sheet 3':
...          [
...              ['O', 'P', 'Q'],
...              [3.0, 2.0, 1.0],
...              [4.0, 3.0, 2.0]
...          ]
...  }

Data manipulations

The main operation on a cell involves cell access, formatting and cleansing. The main operation on a sheet involves the group access to a row or a column, data filtering and data transformation. The main operation in a book is obtain access to individual sheets.

[1]quoted from whatis.com. Technical details can be found at MSDN XLS
[2]xlsx is used by MS-Excel 2007, more infomation can be found at MSDN XLSX

Signature functions

Import data into Python

This library provides one application programming interface to read data from one of the following data sources:

  • physical file
  • memory file
  • SQLAlchemy table
  • Django Model
  • Python data stuctures: dictionary, records and array

and to transform them into one of the data structures:

  • two dimensinal array
  • a dictionary of one dimensional arrays
  • a list of dictionaries
  • a dictionary of two dimensional arrays
  • a Sheet
  • a Book
Four data access functions

It is believed that once a Python developer could easily operate on list, dictionary and various mixture of both. This library provides four module level functions to help you obtain excel data in those formats. Please refer to “A list of module level functions”, the first three functions operates on any one sheet from an excel book and the fourth one returns all data in all sheets in an excel book.

A list of module level functions
Functions Psudo name Python name
get_array() two dimensional array a list of lists
get_dict() a dictionary of one dimensional arrays an ordered dictionary of lists
get_records() a list of dictionaries a list of dictionaries
get_book_dict() a dictionary of two dimensional arrays a dictionary of lists of lists

See also:

Two native functions

In cases where the excel data needs custom manipulations, a pyexcel user got a few choices: one is to use Sheet and Book, the other is to look for more sophisticated ones:

  • Pandas, for numerical analysis
  • Do-it-yourself
Functions Returns
get_sheet() Sheet
get_book() Book

For all six functions, you can pass on the same command parameters while the return value is what the function says.

Export data from Python

This library provides one application programming interface to transform them into one of the data structures:

  • two dimensinal array
  • a (ordered) dictionary of one dimensional arrays
  • a list of dictionaries
  • a dictionary of two dimensional arrays
  • a Sheet
  • a Book

and write to one of the following data sources:

  • physical file
  • memory file
  • SQLAlchemy table
  • Django Model
  • Python data stuctures: dictionary, records and array

Here are the two functions:

Functions Description
save_as() Works well with single sheet file
save_book_as() Works with multiple sheet file

See also:

Data transportation/transcoding

Based the capability of this library, it is capable of transporting your data in between any of these data sources:

  • physical file
  • memory file
  • SQLAlchemy table
  • Django Model
  • Python data stuctures: dictionary, records and array

See also:

Tutorial

Sheet: Data Access

Random access to individual cell

To randomly access a cell of Sheet instance, two syntax are available:

sheet[row, column]

or:

sheet['A1']

The former syntax is handy when you know the row and column numbers. The latter syntax is introduced to help you convert the excel column header such as “AX” to integer numbers.

Suppose you have the following data, you can get value 5 by reader[2, 2].

Example X Y Z
a 1 2 3
b 4 5 6
c 7 8 9

Here is the example code showing how you can randomly access a cell:

>>> import pyexcel
>>> sheet = pyexcel.get_sheet(file_name="example.xls")
>>> sheet.content
+---------+---+---+---+
| Example | X | Y | Z |
+---------+---+---+---+
| a       | 1 | 2 | 3 |
+---------+---+---+---+
| b       | 4 | 5 | 6 |
+---------+---+---+---+
| c       | 7 | 8 | 9 |
+---------+---+---+---+
>>> print(sheet[2, 2])
5
>>> print(sheet["C3"])
5

Random access to rows and columns

Continue with previous excel file, you can access row and column separately:

>>> sheet.row[1]
['a', 1, 2, 3]
>>> sheet.column[2]
['Y', 2, 5, 8]

Use custom names instead of index

Alternatively, it is possible to use the first row to refer to each columns:

>>> sheet.name_columns_by_row(0)
>>> print(sheet[1, "Y"])
5

You have noticed the row index has been changed. It is because first row is taken as the column names, hence all rows after the first row are shifted. Now accessing the columns are changed too:

>>> sheet.column['Y']
[2, 5, 8]

Hence access the same cell, this statement also works:

>>> sheet.column['Y'][1]
5

Further more, it is possible to use first column to refer to each rows:

>>> sheet.name_rows_by_column(0)

To access the same cell, we can use this line:

>>> sheet.row["b"][1]
5

For the same reason, the row index has been reduced by 1. Since we have named columns and rows, it is possible to access the same cell like this:

>>> print(sheet["b", "Y"])
5

For multiple sheet file, you can regard it as three dimensional array if you use Book. So, you access each cell via this syntax:

book[sheet_index][row, column]

or:

book["sheet_name"][row, column]

Suppose you have the following sheets:

Sheet 1
1 2 3
4 5 6
7 8 9
Sheet 2
X Y Z
1 2 3
4 5 6
Sheet 3
O P Q
3 2 1
4 3 2

And you can randomly access a cell in a sheet:

>>> book = pyexcel.get_book(file_name="example.xls")
>>> print(book["Sheet 1"][0,0])
1
>>> print(book[0][0,0]) # the same cell
1

Tip

With pyexcel, you can regard single sheet reader as an two dimensional array and multi-sheet excel book reader as a ordered dictionary of two dimensional arrays.

Reading a single sheet excel file

Suppose you have a csv, xls, xlsx file as the following:

1 2 3
4 5 6
7 8 9

The following code will give you the data in json:

>>> import json
>>> # "example.csv","example.xlsx","example.xlsm"
>>> sheet = pyexcel.get_sheet(file_name="example.xls")
>>> print(json.dumps(sheet.to_array()))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Read the sheet as a dictionary

Suppose you have a csv, xls, xlsx file as the following:

Column 1 Column 2 Column 3
1 4 7
2 5 8
3 6 9

The following code will give you data series in a dictionary:

>>> # "example.xls","example.xlsx","example.xlsm"
>>> sheet = pyexcel.get_sheet(file_name="example_series.xls", name_columns_by_row=0)
>>> sheet.to_dict()
OrderedDict([('Column 1', [1, 4, 7]), ('Column 2', [2, 5, 8]), ('Column 3', [3, 6, 9])])
Can I get an array of dictionaries per each row?

Suppose you have the following data:

X Y Z
1 2 3
4 5 6
7 8 9

The following code will produce what you want:

>>> # "example.csv","example.xlsx","example.xlsm"
>>> sheet = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> records = sheet.to_records()
>>> for record in records:
...     keys = sorted(record.keys())
...     print("{")
...     for key in keys:
...         print("'%s':%d" % (key, record[key]))
...     print("}")
{
'X':1
'Y':2
'Z':3
}
{
'X':4
'Y':5
'Z':6
}
{
'X':7
'Y':8
'Z':9
}
>>> print(records[0]["X"]) # access first row and first item
1

Writing a single sheet excel file

Suppose you have an array as the following:

1 2 3
4 5 6
7 8 9

The following code will write it as an excel file of your choice:

.. testcode::
>>> array = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> # "output.xls" "output.xlsx" "output.ods" "output.xlsm"
>>> sheet = pyexcel.Sheet(array)
>>> sheet.save_as("output.csv")

Suppose you have a dictionary as the following:

Column 1 Column 2 Column 3
1 4 7
2 5 8
3 6 9

The following code will write it as an excel file of your choice:

>>> example_dict = {"Column 1": [1, 2, 3], "Column 2": [4, 5, 6], "Column 3": [7, 8, 9]}
>>> # "output.xls" "output.xlsx" "output.ods" "output.xlsm"
>>> sheet = pyexcel.get_sheet(adict=example_dict)
>>> sheet.save_as("output.csv")

Write multiple sheet excel file

Suppose you have previous data as a dictionary and you want to save it as multiple sheet excel file:

>>> content = {
...     'Sheet 1':
...         [
...             [1.0, 2.0, 3.0],
...             [4.0, 5.0, 6.0],
...             [7.0, 8.0, 9.0]
...         ],
...     'Sheet 2':
...         [
...             ['X', 'Y', 'Z'],
...             [1.0, 2.0, 3.0],
...             [4.0, 5.0, 6.0]
...         ],
...     'Sheet 3':
...         [
...             ['O', 'P', 'Q'],
...             [3.0, 2.0, 1.0],
...             [4.0, 3.0, 2.0]
...         ]
... }
>>> book = pyexcel.get_book(bookdict=content)
>>> book.save_as("output.xls")

You shall get a xls file

Read multiple sheet excel file

Let’s read the previous file back:

>>> book = pyexcel.get_book(file_name="output.xls")
>>> sheets = book.to_dict()
>>> for name in sheets.keys():
...     print(name)
Sheet 1
Sheet 2
Sheet 3

Work with data series in a single sheet

Suppose you have the following data in any of the supported excel formats again:

Column 1 Column 2 Column 3
1 4 7
2 5 8
3 6 9
>>> sheet = pyexcel.get_sheet(file_name="example_series.xls", name_columns_by_row=0)
Play with data

You can get headers:

>>> print(list(sheet.colnames))
['Column 1', 'Column 2', 'Column 3']

You can use a utility function to get all in a dictionary:

>>> sheet.to_dict()
OrderedDict([('Column 1', [1, 4, 7]), ('Column 2', [2, 5, 8]), ('Column 3', [3, 6, 9])])

Maybe you want to get only the data without the column headers. You can call rows() instead:

>>> pyexcel.to_array(sheet.rows())
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

You can get data from the bottom to the top one by calling rrows() instead:

>>> pyexcel.to_array(sheet.rrows())
[[7, 8, 9], [4, 5, 6], [1, 2, 3]]

You might want the data arranged vertically. You can call columns() instead:

>>> pyexcel.to_array(sheet.columns())
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

You can get columns in reverse sequence as well by calling rcolumns() instead:

>>> pyexcel.to_array(sheet.rcolumns())
[[3, 6, 9], [2, 5, 8], [1, 4, 7]]

Do you want to flatten the data? You can get the content in one dimensional array. If you are interested in playing with one dimensional enumeration, you can check out these functions enumerate(), reverse(), vertical(), and rvertical():

>>> pyexcel.to_array(sheet.enumerate())
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> pyexcel.to_array(sheet.reverse())
[9, 8, 7, 6, 5, 4, 3, 2, 1]
>>> pyexcel.to_array(sheet.vertical())
[1, 4, 7, 2, 5, 8, 3, 6, 9]
>>> pyexcel.to_array(sheet.rvertical())
[9, 6, 3, 8, 5, 2, 7, 4, 1]

Sheet: Data manipulation

The data in a sheet is represented by Sheet which maintains the data as a list of lists. You can regard Sheet as a two dimensional array with additional iterators. Random access to individual column and row is explosed by NamedColumn and NamedRow

Column manipulation

Suppose have one data file as the following:

>>> sheet = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> sheet
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+==========+==========+==========+
| 1        | 4        | 7        |
+----------+----------+----------+
| 2        | 5        | 8        |
+----------+----------+----------+
| 3        | 6        | 9        |
+----------+----------+----------+

And you want to update Column 2 with these data: [11, 12, 13]

>>> sheet.column["Column 2"] = [11, 12, 13]
>>> sheet.column[1]
[11, 12, 13]
>>> sheet
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+==========+==========+==========+
| 1        | 11       | 7        |
+----------+----------+----------+
| 2        | 12       | 8        |
+----------+----------+----------+
| 3        | 13       | 9        |
+----------+----------+----------+
Remove one column of a data file

If you want to remove Column 2, you can just call:

>>> del sheet.column["Column 2"]
>>> sheet.column["Column 3"]
[7, 8, 9]

The sheet content will become:

>>> sheet
pyexcel sheet:
+----------+----------+
| Column 1 | Column 3 |
+==========+==========+
| 1        | 7        |
+----------+----------+
| 2        | 8        |
+----------+----------+
| 3        | 9        |
+----------+----------+

Append more columns to a data file

Continue from previous example. Suppose you want add two more columns to the data file

Column 4 Column 5
10 13
11 14
12 15

Here is the example code to append two extra columns:

>>> extra_data = [
...    ["Column 4", "Column 5"],
...    [10, 13],
...    [11, 14],
...    [12, 15]
... ]
>>> sheet2 = pyexcel.Sheet(extra_data)
>>> sheet.column += sheet2
>>> sheet.column["Column 4"]
[10, 11, 12]
>>> sheet.column["Column 5"]
[13, 14, 15]

Here is what you will get:

>>> sheet
pyexcel sheet:
+----------+----------+----------+----------+
| Column 1 | Column 3 | Column 4 | Column 5 |
+==========+==========+==========+==========+
| 1        | 7        | 10       | 13       |
+----------+----------+----------+----------+
| 2        | 8        | 11       | 14       |
+----------+----------+----------+----------+
| 3        | 9        | 12       | 15       |
+----------+----------+----------+----------+
Cherry pick some columns to be removed

Suppose you have the following data:

>>> data = [
...     ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'],
...     [1,2,3,4,5,6,7,9],
... ]
>>> sheet = pyexcel.Sheet(data, name_columns_by_row=0)
>>> sheet
pyexcel sheet:
+---+---+---+---+---+---+---+---+
| a | b | c | d | e | f | g | h |
+===+===+===+===+===+===+===+===+
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 |
+---+---+---+---+---+---+---+---+

And you want to remove columns named as: ‘a’, ‘c, ‘e’, ‘h’. This is how you do it:

>>> del sheet.column['a', 'c', 'e', 'h']
>>> sheet
pyexcel sheet:
+---+---+---+---+
| b | d | f | g |
+===+===+===+===+
| 2 | 4 | 6 | 7 |
+---+---+---+---+

What if the headers are in a different row

Suppose you have the following data:

>>> sheet
pyexcel sheet:
+----------+----------+----------+
| 1        | 2        | 3        |
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 4        | 5        | 6        |
+----------+----------+----------+

The way to name your columns is to use index 1:

>>> sheet.name_columns_by_row(1)

Here is what you get:

>>> sheet
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+==========+==========+==========+
| 1        | 2        | 3        |
+----------+----------+----------+
| 4        | 5        | 6        |
+----------+----------+----------+

Row manipulation

Suppose you have the following data:

>>> sheet
pyexcel sheet:
+---+---+---+-------+
| a | b | c | Row 1 |
+---+---+---+-------+
| e | f | g | Row 2 |
+---+---+---+-------+
| 1 | 2 | 3 | Row 3 |
+---+---+---+-------+

You can name your rows by column index at 3:

>>> sheet.name_rows_by_column(3)

Then you can access rows by its name:

>>> sheet.row["Row 1"]
['a', 'b', 'c']

Sheet: Data filtering

There are two ways of applying a filter:

  1. soft filtering. use add_filter(), remove_filter() and clear_filters() to interactively apply a filter. The content is not modified until you call freeze_filters()
  2. hard filtering. use filter() function to apply a filter immediately. The content is modified.

Suppose you have the following data in any of the supported excel formats:

Column 1 Column 2 Column 3
1 4 7
2 5 8
3 6 9
>>> import pyexcel
>>> sheet = pyexcel.get_sheet(file_name="example_series.xls", name_columns_by_row=0)

Filter out some data

You may want to filter odd rows and print them in an array of dictionaries:

>>> sheet.add_filter(pyexcel.OddRowFilter())
>>> sheet.to_array()
[['Column 1', 'Column 2', 'Column 3'], [4, 5, 6]]

Let’s try to further filter out even columns:

>>> sheet.add_filter(pyexcel.EvenColumnFilter())
>>> sheet.to_dict()
OrderedDict([('Column 1', [4]), ('Column 3', [6])])
Save the data

Let’s save the previous filtered data:

>>> sheet.save_as("example_series_filter.xls")

When you open example_series_filter.xls, you will find these data

Column 1 Column 3
2 8

The complete code is:

import pyexcel

sheet = pyexcel.get_sheet(file_name="example_series.xls")
sheet.add_filter(pyexcel.OddRowFilter())
sheet.add_filter(pyexcel.EvenColumnFilter())
sheet.save_as("example_series_filter.xls")
How to filter out empty rows in my sheet?

Suppose you have the following data in a sheet and you want to remove those rows with blanks:

>>> import pyexcel as pe
>>> sheet = pe.Sheet([[1,2,3],['','',''],['','',''],[1,2,3]])
>>> sheet
pyexcel sheet:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
+---+---+---+
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+

You can use pyexcel.filters.RowValueFilter, which examines each row, return True if the row should be filtered out. So, let’s define a filter function:

>>> def filter_row(row):
...     result = [element for element in row if element != '']
...     return len(result)==0

Now, let’s contruct a row value filter

>>> row_value_filter = pe.RowValueFilter(filter_row)

And then apply the filter on the sheet:

>>> sheet.filter(row_value_filter)
>>> sheet
pyexcel sheet:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+

Work with multi-sheet file

How do I read a book, pocess it and save to a new book

Yes, you can do that. The code looks like this:

import pyexcel

book = pyexcel.get_book(file_name="yourfile.xls")
for sheet in book:
    # do you processing with sheet
    # do filtering?
    pass
book.save_as("output.xls")
What would happen if I save a multi sheet book into “csv” file

Well, you will get one csv file per each sheet. Suppose you have these code:

>>> content = {
...     'Sheet 1':
...         [
...             [1.0, 2.0, 3.0],
...             [4.0, 5.0, 6.0],
...             [7.0, 8.0, 9.0]
...         ],
...     'Sheet 2':
...         [
...             ['X', 'Y', 'Z'],
...             [1.0, 2.0, 3.0],
...             [4.0, 5.0, 6.0]
...         ],
...     'Sheet 3':
...         [
...             ['O', 'P', 'Q'],
...             [3.0, 2.0, 1.0],
...             [4.0, 3.0, 2.0]
...         ]
... }
>>> book = pyexcel.Book(content)
>>> book.save_as("myfile.csv")

You will end up with three csv files:

>>> import glob
>>> outputfiles = glob.glob("myfile_*.csv")
>>> for file in sorted(outputfiles):
...     print(file)
...
myfile__Sheet 1__0.csv
myfile__Sheet 2__1.csv
myfile__Sheet 3__2.csv

and their content is the value of the dictionary at the corresponding key

After I have saved my multiple sheet book in csv format, how do I get them back in pyexcel

First of all, you can read them back individual as csv file using meth:~pyexcel.get_sheet method. Secondly, the pyexcel can do the magic to load all of them back into a book. You will just need to provide the common name before the separator “__”:

>>> book2 = pyexcel.get_book(file_name="myfile.csv")
>>> book2
Sheet 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+
Sheet 2:
+---+---+---+
| X | Y | Z |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
Sheet 3:
+---+---+---+
| O | P | Q |
+---+---+---+
| 3 | 2 | 1 |
+---+---+---+
| 4 | 3 | 2 |
+---+---+---+

Sheet: Data conversion

How to obtain records from an excel sheet

Suppose you want to process the following excel data :

Name Age
Adam 28
Beatrice 29
Ceri 30
Dean 26

Here are the example code:

>>> import pyexcel as pe
>>> records = pe.get_records(file_name="your_file.xls")
>>> for record in records:
...     print("%s is aged at %d" % (record['Name'], record['Age']))
Adam is aged at 28
Beatrice is aged at 29
Ceri is aged at 30
Dean is aged at 26

How to get an array from an excel sheet

Suppose you have a csv, xls, xlsx file as the following:

1 2 3
4 5 6
7 8 9

The following code will give you the data in json:

>>> import pyexcel
>>> # "example.csv","example.xlsx","example.xlsm"
>>> my_array = pyexcel.get_array(file_name="example.xls")
>>> my_array
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

How to save an python array as an excel file

Suppose you have the following array:

>>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

And here is the code to save it as an excel file

>>> import pyexcel
>>> pyexcel.save_as(array=data, dest_file_name="example.xls")

Let’s verify it:

>>> pyexcel.get_sheet(file_name="example.xls")
pyexcel_sheet1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+

How to save an python array as a csv file with special delimiter

Suppose you have the following array:

>>> data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

And here is the code to save it as an excel file

>>> import pyexcel
>>> pyexcel.save_as(array=data,
...                 dest_file_name="example.csv",
...                 dest_delimiter=':')

Let’s verify it:

>>> with open("example.csv") as f:
...     for line in f.readlines():
...         print(line.rstrip())
...
1:2:3
4:5:6
7:8:9

How to get a dictionary from an excel sheet

Suppose you have a csv, xls, xlsx file as the following:

Column 1 Column 2 Column 3
1 4 7
2 5 8
3 6 9

The following code will give you data series in a dictionary:

>>> import pyexcel
>>> from pyexcel._compact import OrderedDict
>>> my_dict = pyexcel.get_dict(file_name="example_series.xls", name_columns_by_row=0)
>>> isinstance(my_dict, OrderedDict)
True
>>> for key, values in my_dict.items():
...     print({key: values})
{'Column 1': [1, 4, 7]}
{'Column 2': [2, 5, 8]}
{'Column 3': [3, 6, 9]}

Please note that my_dict is an OrderedDict.

How to obtain a dictionary from a multiple sheet book

Suppose you have a multiple sheet book as the following:

Sheet 1
1 2 3
4 5 6
7 8 9
Sheet 2
X Y Z
1 2 3
4 5 6
Sheet 3
O P Q
3 2 1
4 3 2

Here is the code to obtain those sheets as a single dictionary:

>>> import pyexcel
>>> import json
>>> book_dict = pyexcel.get_book_dict(file_name="book.xls")
>>> isinstance(book_dict, OrderedDict)
True
>>> for key, item in book_dict.items():
...     print(json.dumps({key: item}))
{"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}
{"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}
{"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}

How to save a dictionary of two dimensional array as an excel file

Suppose you want to save the below dictionary to an excel file

>>> a_dictionary_of_two_dimensional_arrays = {
...      'Sheet 1':
...          [
...              [1.0, 2.0, 3.0],
...              [4.0, 5.0, 6.0],
...              [7.0, 8.0, 9.0]
...          ],
...      'Sheet 2':
...          [
...              ['X', 'Y', 'Z'],
...              [1.0, 2.0, 3.0],
...              [4.0, 5.0, 6.0]
...          ],
...      'Sheet 3':
...          [
...              ['O', 'P', 'Q'],
...              [3.0, 2.0, 1.0],
...              [4.0, 3.0, 2.0]
...          ]
...  }

Here is the code:

>>> pyexcel.save_book_as(
...    bookdict=a_dictionary_of_two_dimensional_arrays,
...    dest_file_name="book.xls"
... )

If you want to preserve the order of sheets in your dictionary, you have to pass on an ordered dictionary to the function itself. For example:

>>> data = OrderedDict()
>>> data.update({"Sheet 2": a_dictionary_of_two_dimensional_arrays['Sheet 2']})
>>> data.update({"Sheet 1": a_dictionary_of_two_dimensional_arrays['Sheet 1']})
>>> data.update({"Sheet 3": a_dictionary_of_two_dimensional_arrays['Sheet 3']})
>>> pyexcel.save_book_as(bookdict=data, dest_file_name="book.xls")

Let’s verify its order:

>>> book_dict = pyexcel.get_book_dict(file_name="book.xls")
>>> for key, item in book_dict.items():
...     print(json.dumps({key: item}))
{"Sheet 2": [["X", "Y", "Z"], [1, 2, 3], [4, 5, 6]]}
{"Sheet 1": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}
{"Sheet 3": [["O", "P", "Q"], [3, 2, 1], [4, 3, 2]]}

Please notice that “Sheet 2” is the first item in the book_dict, meaning the order of sheets are preserved.

How to an excel sheet to a database using SQLAlchemy

Note

You can find the complete code of this example in examples folder on github

Before going ahead, let’s import the needed components and initialize sql engine and table base:

>>> from sqlalchemy import create_engine
>>> from sqlalchemy.ext.declarative import declarative_base
>>> from sqlalchemy import Column , Integer, String, Float, Date
>>> from sqlalchemy.orm import sessionmaker
>>> engine = create_engine("sqlite:///birth.db")
>>> Base = declarative_base()
>>> Session = sessionmaker(bind=engine)

Let’s suppose we have the following database model:

>>> class BirthRegister(Base):
...     __tablename__='birth'
...     id=Column(Integer, primary_key=True)
...     name=Column(String)
...     weight=Column(Float)
...     birth=Column(Date)

Let’s create the table:

>>> Base.metadata.create_all(engine)

Now here is a sample excel file to be saved to the table:

name weight birth
Adam 3.4 2015-02-03
Smith 4.2 2014-11-12

Here is the code to import it:

>>> session = Session() # obtain a sql session
>>> pyexcel.save_as(file_name="birth.xls", name_columns_by_row=0, dest_session=session, dest_table=BirthRegister)

Done it. It is that simple. Let’s verify what has been imported to make sure.

>>> sheet = pyexcel.get_sheet(session=session, table=BirthRegister)
>>> sheet
birth:
+------------+----+-------+--------+
| birth      | id | name  | weight |
+------------+----+-------+--------+
| 2015-02-03 | 1  | Adam  | 3.4    |
+------------+----+-------+--------+
| 2014-11-12 | 2  | Smith | 4.2    |
+------------+----+-------+--------+

How to open an xls file and save it as csv

Suppose we want to save previous used example ‘birth.xls’ as a csv file

>>> import pyexcel
>>> pyexcel.save_as(file_name="birth.xls", dest_file_name="birth.csv")

Again it is really simple. Let’s verify what we have gotten:

>>> sheet = pyexcel.get_sheet(file_name="birth.csv")
>>> sheet
birth.csv:
+-------+--------+----------+
| name  | weight | birth    |
+-------+--------+----------+
| Adam  | 3.4    | 03/02/15 |
+-------+--------+----------+
| Smith | 4.2    | 12/11/14 |
+-------+--------+----------+

Note

Please note that csv(comma separate value) file is pure text file. Formula, charts, images and formatting in xls file will disappear no matter which transcoding tool you use. Hence, pyexcel is a quick alternative for this transcoding job.

How to open an xls file and save it as xlsx

Warning

Formula, charts, images and formatting in xls file will disappear as pyexcel does not support Formula, charts, images and formatting.

Let use previous example and save it as ods instead

>>> import pyexcel
>>> pyexcel.save_as(file_name="birth.xls",
...                 dest_file_name="birth.xlsx") # change the file extension

Again let’s verify what we have gotten:

>>> sheet = pyexcel.get_sheet(file_name="birth.xlsx")
>>> sheet
pyexcel_sheet1:
+-------+--------+----------+
| name  | weight | birth    |
+-------+--------+----------+
| Adam  | 3.4    | 03/02/15 |
+-------+--------+----------+
| Smith | 4.2    | 12/11/14 |
+-------+--------+----------+

How to open a xls multiple sheet excel book and save it as csv

Well, you write similiar codes as before but you will need to use :meth:~pyexcel.save_book_as function.

Sheet: Formatting

Previous section has assumed the data is in the format that you want. In reality, you have to manipulate the data types a bit to suit your needs. Hence, formatters comes into the scene. The formatters take effect when the data is read on the fly. They do not affect the persistence of the data in the excel files. A row or column formatter can be applied to mutilpe rows/columns. There are two ways of applying a formatter:

  1. use add_formatter(), remove_formatter() and clear_formatter() to apply formatter on the fly. The formatter takes effect when a cell value is read. In other words, the sheet content is intact until you call freeze_formatters() to apply all added formatters.
  2. use format() to apply formatter immediately.

There is slightly different behavior between csv reader and xls reader. The cell type of the cells read by csv reader will be always text while the cell types read by xls reader vary.

Convert a column of numbers to strings

By default, all values in csv are read back as texts. However, for xls, xlsx and xlsm files, differnt data type are supported. Numbers are always read as float. Therefore, if you should like to have them in string format, you need to do some conversions. Suppose you have the following data in any of the supported excel formats:

userid name
10120 Adam
10121 Bella
10122 Cedar

Let’s read it out first:

>>> import pyexcel
>>> sheet = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> sheet.column["userid"]
[10120, 10121, 10122]

As you can see, userid column is of float type. Next, let’s convert the column to string format:

>>> sheet.column.format(0, str)
>>> sheet.column["userid"]
['10120', '10121', '10122']

Now, they are in string format.

You can do this row by row as well using RowFormatter or do this to a whote spread sheet using SheetFormatter

Cleanse the cells in a spread sheet

Sometimes, the data in a spreadsheet may have unwanted strings in all or some cells. Let’s take an example. Suppose we have a spread sheet that contains all strings but it as random spaces before and after the text values. Some field had weird characters, such as “  ”:

Version Comments Author  
v0.0.1 Release versions  Eda
  v0.0.2 Useful updates      Freud

First, let’s read the content and see what do we have:

>>> sheet = pyexcel.get_sheet(file_name="example.xls")
>>> sheet.to_array()
[['        Version', '        Comments', '       Author  '], ['  v0.0.1       ', ' Release versions', '            Eda'], ['  v0.0.2  ', 'Useful updates    ', '   Freud']]

Now try to create a custom cleanse function:

>>> def cleanse_func(v):
...     v = v.replace(" ", "")
...     v = v.rstrip().strip()
...     return v
...

Then let’s create a SheetFormatter and apply it:

>>> sf = pyexcel.formatters.SheetFormatter(cleanse_func)
>>> sheet.add_formatter(sf)
>>> sheet.to_array()
[['Version', 'Comments', 'Author'], ['v0.0.1', 'Release versions', 'Eda'], ['v0.0.2', 'Useful updates', 'Freud']]

So in the end, you get this:

Version Comments Author
v0.0.1 Release versions Eda
v0.0.2 Useful updates Freud

Book: Sheet operations

Access to individual sheets

You can access individual sheet of a book via attribute:

>>> book = pyexcel.get_book(file_name="book.xls")
>>> book.sheet3
sheet3:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+
| 7 | 8 | 9 |
+---+---+---+

or via array notations:

>>> book["sheet 1"] # there is a space in the sheet name
sheet 1:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 4 | 5 | 6 |
+---+---+---+

Merge excel books

Suppose you have two excel books and each had three sheets. You can merge them and get a new book:

You also can merge indivdual sheets:

>>> book1 = pyexcel.get_book(file_name="book1.xls")
>>> book2 = pyexcel.get_book(file_name="book2.xlsx")
>>> merged_book = book1 + book2
>>> merged_book = book1["Sheet 1"] + book2["Sheet 2"]
>>> merged_book = book1["Sheet 1"] + book2
>>> merged_book = book1 + book2["Sheet 2"]

Manipulate individual sheets

merge sheets into a single sheet

Suppose you want to merge many csv files row by row into a new sheet.

>>> import pyexcel as pe
>>> import glob
>>> merged = pyexcel.Sheet()
>>> for file in glob.glob("*.csv"):
...     merged.row += pe.get_sheet(file_name=file)
>>> merged.save_as("merged.csv")

Work with excel files

Warning

The pyexcel DOES NOT consider Fonts, Styles, Formulas and Charts at all. When you load a stylish excel and update it, you definitely will lose all those.

Add a new row to an existing file

Suppose you have one data file as the following:

example.xls

Column 1 Column 2 Column 3
1 4 7
2 5 8
3 6 9

And you want to add a new row:

12, 11, 10

Here is the code:

>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="example.xls")
>>> sheet.row += [12, 11, 10]
>>> sheet.save_as("new_example.xls")
>>> pe.get_sheet(file_name="new_example.xls")
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 1        | 4        | 7        |
+----------+----------+----------+
| 2        | 5        | 8        |
+----------+----------+----------+
| 3        | 6        | 9        |
+----------+----------+----------+
| 12       | 11       | 10       |
+----------+----------+----------+

Update an existing row to an existing file

Suppose you want to update the last row of the example file as:

[‘N/A’, ‘N/A’, ‘N/A’]

Here is the sample code:

>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="example.xls")
>>> sheet.row[3] = ['N/A', 'N/A', 'N/A']
>>> sheet.save_as("new_example1.xls")
>>> pe.get_sheet(file_name="new_example1.xls")
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 1        | 4        | 7        |
+----------+----------+----------+
| 2        | 5        | 8        |
+----------+----------+----------+
| N/A      | N/A      | N/A      |
+----------+----------+----------+

Add a new column to an existing file

And you want to add a column instead:

[“Column 4”, 10, 11, 12]

Here is the code:

>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="example.xls")
>>> sheet.column += ["Column 4", 10, 11, 12]
>>> sheet.save_as("new_example2.xls")
>>> pe.get_sheet(file_name="new_example2.xls")
pyexcel sheet:
+----------+----------+----------+----------+
| Column 1 | Column 2 | Column 3 | Column 4 |
+----------+----------+----------+----------+
| 1        | 4        | 7        | 10       |
+----------+----------+----------+----------+
| 2        | 5        | 8        | 11       |
+----------+----------+----------+----------+
| 3        | 6        | 9        | 12       |
+----------+----------+----------+----------+

Update an existing column to an existing file

Again let’s update “Column 3” with:

[100, 200, 300]

Here is the sample code:

>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="example.xls")
>>> sheet.column[2] = ["Column 3", 100, 200, 300]
>>> sheet.save_as("new_example3.xls")
>>> pe.get_sheet(file_name="new_example3.xls")
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 1        | 4        | 100      |
+----------+----------+----------+
| 2        | 5        | 200      |
+----------+----------+----------+
| 3        | 6        | 300      |
+----------+----------+----------+

Alternatively, you could have done like this:

>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> sheet.column["Column 3"] = [100, 200, 300]
>>> sheet.save_as("new_example4.xls")
>>> pe.get_sheet(file_name="new_example4.xls")
pyexcel sheet:
+----------+----------+----------+
| Column 1 | Column 2 | Column 3 |
+----------+----------+----------+
| 1        | 4        | 100      |
+----------+----------+----------+
| 2        | 5        | 200      |
+----------+----------+----------+
| 3        | 6        | 300      |
+----------+----------+----------+

How about the same alternative solution to previous row based example? Well, you’d better to have the following kind of data

row_example.xls

Row 1 1 2 3
Row 2 4 5 6
Row 3 7 8 9

And then you want to update “Row 3” with for example:

[100, 200, 300]

These code would do the job:

>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="row_example.xls", name_rows_by_column=0)
>>> sheet.row["Row 3"] = [100, 200, 300]
>>> sheet.save_as("new_example5.xls")
>>> pe.get_sheet(file_name="new_example5.xls")
pyexcel sheet:
+-------+-----+-----+-----+
| Row 1 | 1   | 2   | 3   |
+-------+-----+-----+-----+
| Row 2 | 4   | 5   | 6   |
+-------+-----+-----+-----+
| Row 3 | 100 | 200 | 300 |
+-------+-----+-----+-----+

Work with excel files in memory

Excel files in memory can be manipulated directly without saving it to physical disk and vice versa. This is useful in excel file handling at file upload or in excel file download. For example:

>>> import pyexcel

>>> content = "1,2,3\n3,4,5"
>>> sheet = pyexcel.get_sheet(file_type="csv", file_content=content)
>>> sheet.format(int)
>>> print(sheet.to_array())
[[1, 2, 3], [3, 4, 5]]

Read any supported excel and respond its content in json

You can find a real world example in examples/memoryfile/ directory: pyexcel_server.py. Here is the example snippet

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def upload():
    if request.method == 'POST' and 'excel' in request.files:
        # handle file upload
        filename = request.files['excel'].filename
        extension = filename.split(".")[1]
        # Obtain the file extension and content
        # pass a tuple instead of a file name
        sheet = pyexcel.load_from_memory(extension, request.files['excel'].read())
        # then use it as usual
        data = pyexcel.to_dict(sheet)
        # respond with a json
        return jsonify({"result":data})
    return render_template...

request.files[‘excel’] in line 4 holds the file object. line 5 finds out the file extension. line 8 feeds in a tuple to Book. line 10 gives a dictionary represetation of the excel file and line 11 send the json represetation of the excel file back to client browser

Write to memory and respond to download

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
data = [
    [...],
    ...
]

@app.route('/download')
def download():
    sheet = pe.Sheet(data)
    io = StringIO()
    sheet.save_to_memory("csv", io)
    output = make_response(io.getvalue())
    output.headers["Content-Disposition"] = "attachment; filename=export.csv"
    output.headers["Content-type"] = "text/csv"
    return output

make_response is a Flask utility to make a memory content as http response.

Note

You can find the corresponding source code at examples/memoryfile

Relevant packages

Readily made plugins have been made on top of this example. Here is a list of them:

framework plugin/middleware/extension
Flask Flask-Excel
Django django-excel
Pyramid pyramid-excel

And you may make your own by using pyexcel-webio

Migrate from 0.2.1 to 0.2.2

1. Explicit imports, no longer needed

Please forget about these statements:

import pyexcel.ext.xls
import pyexcel.ext.ods
import pyexcel.ext.xlsx

They are no longer needed. As long as you have pip-installed them, they will be auto-loaded. However, if you do not want some of the plugins, please use pip to uninstall them.

What if you have your code as it is? No harm but a few warnings shown:

Deprecated usage since v0.2.2! Explicit import is no longer required. pyexcel.ext.ods is auto imported.

2. Invalid environment marker: platform_python_implementation==”PyPy”

Yes, it is a surprise. Please upgrade setuptools in your environment:

pip install --upgrade setuptools

At the time of writing, setuptools (18.0.1) or setuptools-21.0.0-py2.py3-none-any.whl is installed on author’s computer and worked.

3. How to keep both pyexcel-xls and pyexcel-xlsx

As in Issue 20, pyexcel-xls was used for xls and pyexcel-xlsx had to be used for xlsx. Both must co-exist due to requirements. The workaround would failed when auto-import are enabled in v0.2.2. Hence, user of pyexcel in this situation shall use ‘library’ parameter to all signature functions, to instruct pyexcel to use a named library for each function call.

4. pyexcel.get_io is no longer exposed

pyexcel.get_io was passed on from pyexcel-io. However, it is no longer exposed. Please use pyexcel_io.manager.RWManager.get_io if you have to.

You are likely to use pyexcel.get_io when you do pyexcel.Sheet.save_to_memory() or pyexcel.Book.save_to_memory() where you need to put in a io stream. But actually, with latest code, you could put in a None.

Migrate from 0.1.x to 0.2.x

1. “Writer” is gone, Please use save_as.

Here is a piece of legacy code:

w = pyexcel.Writer("afile.csv")
data=[['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 1.1, 1]]
w.write_array(table)
w.close()

The new code is:

>>> data=[['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 1.1, 1]]
>>> pyexcel.save_as(array=data, dest_file_name="afile.csv")

Here is another piece of legacy code:

content = {
    "X": [1,2,3,4,5],
    "Y": [6,7,8,9,10],
    "Z": [11,12,13,14,15],
}
w = pyexcel.Writer("afile.csv")
w.write_dict(self.content)
w.close()

The new code is:

>>> content = {
...     "X": [1,2,3,4,5],
...     "Y": [6,7,8,9,10],
...     "Z": [11,12,13,14,15],
... }
>>> pyexcel.save_as(adict=content, dest_file_name="afile.csv")

Here is yet another piece of legacy code:

data = [
    [1, 2, 3],
    [4, 5, 6]
]
io = StringIO()
w = pyexcel.Writer(("csv",io))
w.write_rows(data)
w.close()

The new code is:

>>> data = [
...     [1, 2, 3],
...     [4, 5, 6]
... ]
>>> io = pyexcel.save_as(dest_file_type='csv', array=data)
>>> for line in io.readlines():
...     print(line.rstrip())
1,2,3
4,5,6

2. “BookWriter” is gone. Please use save_book_as.

Here is a piece of legacy code:

import pyexcel
content = {
         "Sheet1": [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]],
         "Sheet2": [[4, 4, 4, 4], [5, 5, 5, 5], [6, 6, 6, 6]],
         "Sheet3": [[u'X', u'Y', u'Z'], [1, 4, 7], [2, 5, 8], [3, 6, 9]]
     }
w = pyexcel.BookWriter("afile.csv")
w.write_book_from_dict(content)
w.close()

The replacement code is:

>>> import pyexcel
>>> content = {
...          "Sheet1": [[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]],
...          "Sheet2": [[4, 4, 4, 4], [5, 5, 5, 5], [6, 6, 6, 6]],
...          "Sheet3": [[u'X', u'Y', u'Z'], [1, 4, 7], [2, 5, 8], [3, 6, 9]]
...      }
>>> pyexcel.save_book_as(bookdict=content, dest_file_name="afile.csv")

Cook book

Recipies

Warning

The pyexcel DOES NOT consider Fonts, Styles and Charts at all. In the resulting excel files, fonts, styles and charts will not be transferred.

These recipies give a one-stop utility functions for known use cases. Simliar functionality can be achieved using other application interfaces.

Update one column of a data file

Suppose you have one data file as the following:

example.xls

Column 1 Column 2 Column 3
1 4 7
2 5 8
3 6 9

And you want to update Column 2 with these data: [11, 12, 13]

Here is the code:

>>> from pyexcel.cookbook import update_columns
>>> custom_column = {"Column 2":[11, 12, 13]}
>>> update_columns("example.xls", custom_column, "output.xls")

Your output.xls will have these data:

Column 1 Column 2 Column 3
1 11 7
2 12 8
3 13 9

Update one row of a data file

Suppose you have the same data file:

example.xls

Row 1 1 2 3
Row 2 4 5 6
Row 3 7 8 9

And you want to update the second row with these data: [7, 4, 1]

Here is the code:

>>> from pyexcel.cookbook import update_rows
>>> custom_row = {"Row 1":[11, 12, 13]}
>>> update_rows("example.xls", custom_row, "output.xls")

Your output.xls will have these data:

Column 1 Column 2 Column 3
7 4 1
2 5 8
3 6 9

Merge two files into one

Suppose you want to merge the following two data files:

example.csv

Column 1 Column 2 Column 3
1 4 7
2 5 8
3 6 9

example.xls

Column 4 Column 5
10 12
11 13

The following code will merge the tow into one file, say “output.xls”:

>>> from pyexcel.cookbook import merge_two_files
>>> merge_two_files("example.csv", "example.xls", "output.xls")

The output.xls would have the following data:

Column 1 Column 2 Column 3 Column 4 Column 5
1 4 7 10 12
2 5 8 11 13
3 6 9    

Select candidate columns of two files and form a new one

Suppose you have these two files:

example.ods

Column 1 Column 2 Column 3 Column 4 Column 5
1 4 7 10 13
2 5 8 11 14
3 6 9 12 15

example.xls

Column 6 Column 7 Column 8 Column 9 Column 10
16 17 18 19 20
>>> data = [
...      ["Column 1", "Column 2", "Column 3", "Column 4", "Column 5"],
...      [1, 4, 7, 10, 13],
...      [2, 5, 8, 11, 14],
...      [3, 6, 9, 12, 15]
...  ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.csv")
>>> data = [
...      ["Column 6", "Column 7", "Column 8", "Column 9", "Column 10"],
...      [16, 17, 18, 19, 20]
...  ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.xls")

And you want to filter out column 2 and 4 from example.ods, filter out column 6 and 7 and merge them:

Column 1 Column 3 Column 5 Column 8 Column 9 Column 10
1 7 13 18 19 20
2 8 14      
3 9 15      

The following code will do the job:

>>> from pyexcel.cookbook import merge_two_readers
>>> from pyexcel.filters import EvenColumnFilter, ColumnFilter
>>> sheet1 = pyexcel.get_sheet(file_name="example.csv", name_columns_by_row=0)
>>> sheet2 = pyexcel.get_sheet(file_name="example.xls", name_columns_by_row=0)
>>> sheet1.filter(pyexcel.EvenColumnFilter())
>>> sheet2.filter(pyexcel.ColumnFilter([0, 1]))
>>> merge_two_readers(sheet1, sheet2, "output.xls")

Merge two files into a book where each file become a sheet

Suppose you want to merge the following two data files:

example.csv

Column 1 Column 2 Column 3
1 4 7
2 5 8
3 6 9

example.xls

Column 4 Column 5
10 12
11 13
>>> data = [
...      ["Column 1", "Column 2", "Column 3"],
...      [1, 2, 3],
...      [4, 5, 6],
...      [7, 8, 9]
...  ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.csv")
>>> data = [
...      ["Column 4", "Column 5"],
...      [10, 12],
...      [11, 13]
...  ]
>>> s = pyexcel.Sheet(data)
>>> s.save_as("example.xls")

The following code will merge the tow into one file, say “output.xls”:

>>> from pyexcel.cookbook import merge_all_to_a_book
>>> merge_all_to_a_book(["example.csv", "example.xls"], "output.xls")

The output.xls would have the following data:

example.csv as sheet name and inside the sheet, you have:

Column 1 Column 2 Column 3
1 4 7
2 5 8
3 6 9

example.ods as sheet name and inside the sheet, you have:

Column 4 Column 5
10 12
11 13

Merge all excel files in directory into a book where each file become a sheet

The following code will merge every excel files into one file, say “output.xls”:

from pyexcel.cookbook import merge_all_to_a_book
import glob


merge_all_to_a_book(glob.glob("your_csv_directory\*.csv"), "output.xls")

You can mix and match with other excel formats: xls, xlsm and ods. For example, if you are sure you have only xls, xlsm, xlsx, ods and csv files in your_excel_file_directory, you can do the following:

from pyexcel.cookbook import merge_all_to_a_book
import glob


merge_all_to_a_book(glob.glob("your_excel_file_directory\*.*"), "output.xls")

Split a book into single sheet files

Suppose you have many sheets in a work book and you would like to separate each into a single sheet excel file. You can easily do this:

>>> from pyexcel.cookbook import split_a_book
>>> split_a_book("megabook.xls", "output.xls")
>>> import glob
>>> outputfiles = glob.glob("*_output.xls")
>>> for file in sorted(outputfiles):
...     print(file)
...
Sheet 1_output.xls
Sheet 2_output.xls
Sheet 3_output.xls

for the output file, you can specify any of the supported formats

Extract just one sheet from a book

Suppose you just want to extract one sheet from many sheets that exists in a work book and you would like to separate it into a single sheet excel file. You can easily do this:

>>> from pyexcel.cookbook import extract_a_sheet_from_a_book
>>> extract_a_sheet_from_a_book("megabook.xls", "Sheet 1", "output.xls")
>>> if os.path.exists("Sheet 1_output.xls"):
...     print("Sheet 1_output.xls exists")
...
Sheet 1_output.xls exists

for the output file, you can specify any of the supported formats

Loading from other sources

How to load a sheet from a url

Suppose you have excel file somewhere hosted:

>>> sheet = pe.get_sheet(url='http://yourdomain.com/test.csv')
>>> sheet
csv:
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+

API documentation

API Reference

This is intended for users of pyexcel.

Note

sphinx on ReadTheDocs cannot produce api docs. Please read it from [pypi](http://pythonhosted.org/pyexcel/api.html)

Signature functions

These flags can be passed on to control plugin behaviors:

auto_detect_int

Automatically convert float values to integers if the float number has no decimal values(e.g. 1.00). By default, it does the detection. Setting it to False will turn on this behavior

It has no effect on pyexcel-xlsx because it does that by default.

auto_detect_float

Automatically convert text to float values if possible. This applies only pyexcel-io where csv, tsv, csvz and tsvz formats are supported. By default, it does the detection. Setting it to False will turn on this behavior

auto_detect_datetime

Automatically convert text to python datetime if possible. This applies only pyexcel-io where csv, tsv, csvz and tsvz formats are supported. By default, it does the detection. Setting it to False will turn on this behavior

library

Name a pyexcel plugin to handle a file format. In the situation where multiple plugins were pip installed, it is confusing for pyexcel on which plugin to handle the file format. For example, both pyexcel-xlsx and pyexcel-xls reads xlsx format. Now since version 0.2.2, you can pass on library=”pyexcel-xls” to handle xlsx in a specific function call.

Alternatively, you could uninstall the unwanted pyexcel plugin using pip.

Obtaining data from excel file
get_array(**keywords) Obtain an array from an excel source
get_dict([name_columns_by_row]) Obtain a dictionary from an excel source
get_records([name_columns_by_row]) Obtain a list of records from an excel source
get_book_dict(**keywords) Obtain a dictionary of two dimensional arrays
get_book(**keywords) Get an instance of Book from an excel source
get_sheet(**keywords) Get an instance of Sheet from an excel source
Saving data to excel file
save_as(**keywords) Save a sheet from a data srouce to another one
save_book_as(**keywords) Save a book from a data source to another one

Cookbook

merge_csv_to_a_book(filelist[, outfilename]) merge a list of csv files into a excel book
merge_all_to_a_book(filelist[, outfilename]) merge a list of excel files into a excel book
split_a_book(file_name[, outfilename]) Split a file into separate sheets
extract_a_sheet_from_a_book(file_name, sheetname) Extract a sheet from a excel book

Book

Here’s the entity relationship between Book, Sheet, Row and Column

_images/entity-relationship-diagram.png
Constructor
Book([sheets, filename, path]) Read an excel book that has one or more sheets
Attribute
Book.number_of_sheets() Return the number of sheets
Book.sheet_names() Return all sheet names
Conversions
Book.to_dict() Convert the book to a dictionary
Save changes
Book.save_to(source) Save to a writeable data source
Book.save_as(filename) Save the content to a new file
Book.save_to_memory(file_type[, stream]) Save the content to a memory stream
Book.save_to_database(session, tables[, ...]) Save data in sheets to database tables

Sheet

Constructor
Sheet([sheet, name, name_columns_by_row, ...]) Two dimensional data container for filtering, formatting and iteration
Save changes
Sheet.save_to(source) Save to a writeable data source
Sheet.save_as(filename, **keywords) Save the content to a named file
Sheet.save_to_memory(file_type[, stream]) Save the content to memory
Sheet.save_to_database(session, table[, ...]) Save data in sheet to database table
Attributes
Sheet.row Row representation.
Sheet.column Column representation.
Sheet.number_of_rows() Number of rows in the data sheet
Sheet.number_of_columns() Number of columns in the data sheet
Sheet.row_range() Utility function to get row range
Sheet.column_range() Utility function to get column range
Iteration
Sheet.rows() Returns a top to bottom row iterator
Sheet.rrows() Returns a bottom to top row iterator
Sheet.columns() Returns a left to right column iterator
Sheet.rcolumns() Returns a right to left column iterator
Sheet.enumerate() Iterate cell by cell from top to bottom and from left to right
Sheet.reverse() Opposite to enumerate
Sheet.vertical() Default iterator to go through each cell one by one from
Sheet.rvertical() Default iterator to go through each cell one by one from rightmost
Cell access
Sheet.cell_value(row, column[, new_value]) Random access to the data cells
Sheet.__getitem__(aset)
Row access
Sheet.row_at(index) Gets the data at the specified row
Sheet.set_row_at(row_index, data_array[, ...]) Update a row data range
Sheet.delete_rows(row_indices) Delete one or more rows
Sheet.extend_rows(rows) Take ordereddict to extend named rows
Column access
Sheet.column_at(index) Gets the data at the specified column
Sheet.set_column_at(column_index, data_array) Updates a column data range
Sheet.delete_columns(column_indices) Delete one or more columns
Sheet.extend_columns(columns) Take ordereddict to extend named columns
Data series
Any column as row name
Sheet.name_columns_by_row(row_index) Use the elements of a specified row to represent individual columns
Sheet.rownames Return row names
Sheet.named_column_at(name) Get a column by its name
Sheet.set_named_column_at(name, column_array) Take the first row as column names
Sheet.delete_named_column_at(name) Works only after you named columns by a row
Any row as column name
Sheet.name_rows_by_column(column_index) Use the elements of a specified column to represent individual rows
Sheet.colnames Return column names
Sheet.named_row_at(name) Get a row by its name
Sheet.set_named_row_at(name, row_array) Take the first column as row names
Sheet.delete_named_row_at(name) Take the first column as row names
Formatting
Sheet.format(formatter[, on_demand]) Apply a formatting action for the whole sheet
Sheet.apply_formatter(aformatter) Apply the formatter immediately.
Sheet.add_formatter(aformatter) Add a lazy formatter.
Sheet.remove_formatter(aformatter) Remove a formatter
Sheet.clear_formatters() Clear all formatters
Sheet.freeze_formatters() Apply all added formatters and clear them
Filtering
Sheet.filter(afilter) Apply the filter with immediate effect
Sheet.add_filter(afilter) Apply a filter
Sheet.remove_filter(afilter) Remove a named filter
Sheet.clear_filters() Clears all filters
Sheet.freeze_filters() Apply all filters and delete them
Conversion
Sheet.to_array() Returns an array after filtering
Sheet.to_dict([row]) Returns a dictionary
Sheet.to_records([custom_headers]) Returns the content as an array of dictionaries
Anti-conversion
dict_to_array(*arg, **keywords)
from_records(*arg, **keywords)
Transformation
Sheet.transpose() Roate the data table by 90 degrees
Sheet.map(custom_function) Execute a function across all cells of the sheet
Sheet.region(topleft_corner, bottomright_corner) Get a rectangle shaped data out
Sheet.cut(topleft_corner, bottomright_corner) Get a rectangle shaped data out and clear them in position
Sheet.paste(topleft_corner[, rows, columns]) Paste a rectangle shaped data after a position

Row access

NamedRow(matrix) Series Sheet would have Named Row instead of Row
NamedRow.format([row_index, formatter, ...]) Format a row
NamedRow.select(names) Delete row indices other than specified

Column access

NamedColumn(matrix) Series Sheet would have Named Column instead of Column
NamedColumn.format([column_index, ...]) Format a column
NamedColumn.select(names) Delete columns other than specified

Data formatters

ColumnFormatter(column_index, formatter) Apply formatting on columns
NamedColumnFormatter(column_index, formatter) Apply formatting using named columns
RowFormatter(row_index, formatter) Row Formatter
NamedRowFormatter(row_index, formatter) Formatting rows using named rows
SheetFormatter(formatter) Apply the formatter to all cells in the sheet

Data Filters

ColumnFilter(indices) Filters out a list of columns
SingleColumnFilter(index) Filters out a single column index
OddColumnFilter() Filters out odd indexed columns
EvenColumnFilter() Filters out even indexed columns
ColumnValueFilter(func) Filters out rows based on its row values
RowFilter(indices) Filters a list of rows
SingleRowFilter(index) Filters out a single row
OddRowFilter() Filters out odd indexed rows
EvenRowFilter() Filters out even indexed rows
RowValueFilter(func) Filters out rows based on its row values
RegionFilter(row_slice, column_slice) Filter on both row index and column index

Internal API reference

This is intended for developers and hackers of pyexcel.

Data sheet representation

In inheritance order from parent to child

Matrix(array) The internal representation of a sheet data.
FormattableSheet(array) A represetation of Matrix that accept custom formatters
FilterableSheet(sheet) A represetation of Matrix that can be filtered
NominableSheet([sheet, name, ...]) Allow dictionary group of the content
Sheet([sheet, name, name_columns_by_row, ...]) Two dimensional data container for filtering, formatting and iteration

Row represetation

Row(matrix) Represet row of a matrix

Column represetation

Column(matrix) Represet columns of a matrix

Developer’s guide

Developer’s guide

Here’s the architecture of pyexcel

_images/architecture.png

Pull requests are welcome.

Development steps for code changes

  1. git clone https://github.com/pyexcel/pyexcel.git
  2. cd pyexcel

Upgrade your setup tools and pip. They are needed for development and testing only:

  1. pip install –upgrade setuptools “pip==7.1”

Then install relevant development requirements:

  1. pip install -r rnd_requirements.txt # if such a file exists
  2. pip install -r requirements.txt
  3. pip install -r tests/requirements.txt

In order to update test environment, and documentation, additional setps are required:

  1. pip install moban
  2. git clone https://github.com/pyexcel/pyexcel-commons.git
  3. make your changes in .moban.d directory, then issue command moban

What is rnd_requirements.txt

Usually, it is created when a dependent library is not released. Once the dependecy is installed(will be released), the future version of the dependency in the requirements.txt will be valid.

What is pyexcel-commons

Many information that are shared across pyexcel projects, such as: this developer guide, license info, etc. are stored in pyexcel-commons project.

What is .moban.d

.moban.d stores the specific meta data for the library.

How to test your contribution

Although nose and doctest are both used in code testing, it is adviable that unit tests are put in tests. doctest is incorporated only to make sure the code examples in documentation remain valid across different development releases.

On Linux/Unix systems, please launch your tests like this:

$ make test

On Windows systems, please issue this command:

> test.bat

Acceptance criteria

  1. Has fair amount of documentation
  2. Has Test cases written
  3. Has all code lines tested
  4. Passes all Travis CI builds
  5. Pythonic code please
  6. Agree on NEW BSD License for your contribution

Change log

Change log

0.2.5 - 31.08.2016

Updated:
  1. # 58: texttable should have been made as compulsory requirement

0.2.4 - 14.07.2016

Updated:
  1. For python 2, writing to sys.stdout by pyexcel-cli raise IOError.

0.2.3 - 11.07.2016

Updated:
  1. For python 3, do not seek 0 when saving to memory if sys.stdout is passed on. Hence, adding support for sys.stdin and sys.stdout.

0.2.2 - 01.06.2016

Updated:
  1. Explicit imports, no longer needed
  2. Depends on latest setuptools 18.0.1
  3. NotImplementedError will be raised if parameters to core functions are not supported, e.g. get_sheet(cannot_find_me_option=”will be thrown out as NotImplementedError”)

0.2.1 - 23.04.2016

Added:
  1. add pyexcel-text file types as attributes of pyexcel.Sheet and pyexcel.Book, related to issue 31
  2. auto import pyexcel-text if it is pip installed
Updated:
  1. code refactored sources for easy addition of sources.
  2. bug fix issue 29, Even if the format is a string it is displayed as a float
  3. pyexcel-text is no longer a plugin to pyexcel-io but to pyexcel.sources, see pyexcel-text issue #22
Removed:
  1. pyexcel.presentation is removed. No longer the internal decorate @outsource is used. related to issue 31

0.2.0 - 17.01.2016

Updated
  1. adopt pyexcel-io yield key word to return generator as content
  2. pyexcel.save_as and pyexcel.save_book_as get performance imporvements

Indices and tables

Footnotes

[1]zipped csv file
[2]zipped tsv file