Overview¶
datasheets is a library for interfacing with Google Sheets, including reading data from, writing data to, and modifying the formatting of Google Sheets. It is built on top of Google’s google-api-python-client and oauth2client libraries using the Google Drive v3 and Google Sheets v4 REST APIs.
It can be installed with pip via pip install datasheets
.
Basic Usage¶
Get the necessary OAuth credentials from the Google Developer Console as described in Getting OAuth Credentials.
After that, using datasheets looks like:
import datasheets
# Create a data set to upload
import pandas as pd
df = pd.DataFrame([('a', 1.3), ('b', 2.7), ('c', 3.9)], columns=['letter', 'number'])
client = datasheets.Client()
workbook = client.create_workbook('my_new_workbook')
tab = workbook.create_tab('my_new_tab')
# Upload a data set
tab.insert_data(df, index=False)
# Fetch the data again
df_again = tab.fetch_data()
# Show workbooks you have access to; this may be slow if you are shared on many workbooks
client.fetch_workbooks_info()
# Show tabs within a given workbook
workbook.fetch_tab_names()
Documentation Contents¶
Functionality¶
Note that before you can use datasheets you will need to set up your access keys as described in Getting OAuth Credentials.
Workbook Interactions¶
See all workbooks that exist¶
client = datasheets.Client() client.fetch_workbooks_info() # Optionally limit to a specific folder client.fetch_workbooks_info(folder='Finance Reports')This produces a pandas.DataFrame:
![]()
See all folders that exist¶
client.fetch_folders() # Optionally limit to only folders you own client.fetch_folders(only_mine=True)
Get a workbook¶
workbook = client.fetch_workbook('Marketing Projections')
Create a workbook¶
client.create_workbook('Q2 Roadmap')
Delete a workbook¶
client.delete_workbook('Q2 Roadmap')
Get a workbook’s URL¶
workbook.url
Tab Interactions¶
See all tabs within a workbook¶
Get a tab¶
workbook.fetch_tab('Radio Spend')
Create a tab¶
workbook.create_tab('Podcast Performance')
Delete a tab¶
workbook.delete_tab('Podcast Performance')
Sharing and Unsharing¶
Data Interactions¶
Get all data in a tab¶
tab = workbook.fetch_tab('Radio Spend') df = tab.fetch_data() # Optionally return a dict of rows, where the keys are the values of the first row of cells data = tab.fetch_data(fmt='dict') # Or return a list of headers (the values of the first row) and a list of rows data = tab.fetch_data(fmt='list')
Add data to a tab¶
# Clear all data in a tab and replace it with a new data set tab.insert_data(df) # Add more data to a tab tab.append_data(df) # For either command, optionally skip uploading the index for the DataFrame tab.insert_data(df, index=False)In addition, by setting
autoformat=True
within theinsert_data
orappend_data
methods the data set will be formatted within the tab as shown below:![]()
The dimensions of the tab match the data set, the headers are formatted nicely, all cells are left-aligned, and column widths are auto-scaled to fit their contents.
Remove all data from a tab¶
tab.clear_data(df)
Tab Formatting Interactions¶
# Add rows tab.add_rows(30) # Add columns tab.add_columns(5) # Change horizontal and vertical alignment of all cells in tab tab.align_cells(horizontal='LEFT', vertical='MIDDLE') # Alter the number of rows or columns in a tab; if new dimensions are smaller # than current values the tab will be trimmed down to that size tab.alter_dimensions(nrows=25, ncols=10) # Shortcut to color headers dark gray, set all cells to use Proxima Nova size 10, left-align # and middle-align all cells, resize columns to fit their data, and eliminate empty columns # and rows from tab tab.autoformat(n_header_rows=2) # Resize widths of all columns in a tab to fit their data tab.autosize_columns() # Change font and font size for all cells in a tab tab.format_font(font='Proxima Nova', size=10) # Set header rows in a tab to be dark gray with off-white text, font Proxima Nova size 10, # left-aligned and middle-aligned, and rows will be made "frozen" so that when the user # scrolls these rows stay visible tab.format_headers(nrows=3)In addition, anything not explicitly supported by the datasheets library as a stand-alone method can be accomplished using the Workbook.batch_update method and referencing Google Sheets’ spreadsheets.batchUpdate method. More details and an example exist within the docstring for
datasheets.Workbook.batch_update()
.
Getting OAuth Credentials¶
To utilize datasheets, you’ll need to get it connected to your Google Drive and Google Sheets repositories. There are two possible ways to do this: user-granted authorization (‘OAuth Client ID’) or the use of a service account (‘OAuth Service Account’), both of which are implemented through the underlying Google oauth2client library. A description of both authentication mechanisms follows below, but before we can use either mechanism we need to create a project.
Setting Up A Project¶
Head to https://console.developers.google.com and sign in (or sign up if you haven’t yet).
Create a new project. You can title this whatever you want; it won’t be shown to you or any other end user by datasheets. Note that if your organization has put restrictions on who can create projects you may have to reach out to get added to an existing datasheets project (if it exists) or ask to have a project created.
Use the search bar at the top to search for Google Drive API, then click ‘Enable’.
Do the same as above to enable the Google Sheets API.
Click on ‘Credentials’, go to ‘OAuth consent screen’, and add a product name (e.g. ‘datasheets Python library’) and click Save.
OAuth Client ID Access¶
OAuth Client ID access is the kind of auth that shows up when you click ‘Sign in with Google’ on a
page: you authorize the application to access the information attached to a Google account. For
non-ETL-based use, this is the auth you want. The list of possible accounts to sign in with is based
on the gmail accounts tied to that particular browser. The end-user flow, which is entered on
instantiation of a datasheets.Client()
object, would be as follows.
The list of possible accounts is brought up:
After selecting an account (and authorizing access on the next screen, if this is the first time you’ve selected that account), an authorization confirmation screen is presented:
To set this up:
Click on ‘Create Credentials’.
Choose ‘OAuth client ID’.
Select ‘Web application’, input a name, and enter Authorized JavaScript origins and Authorized redirect URIs. The Authorized JavaScript origins are
http://localhost:8888
andhttp://localhost:8080
. The Authorized redirect URIs are the same except with a forward slash (/) appended.After clicking ‘Create’ and then ‘Ok’ on the following screen, click the download button.
Move the file and rename it. By default datasheets will look for this file in
~/.datasheets/client_secrets.json
, but if you’d prefer to place this file elsewhere you just have to specify the path in the$DATASHEETS_SECRETS_PATH
envvar. Assuming you want to use the default location though, first create the~/.datasheets
folder and move the file you downloaded there, renaming it toclient_secrets.json
. You can do this in Finder, but if you’d prefer terminal then follow the commands below, using your client_secrets file’s original name:mkdir ~/.datasheets mv ~/Downloads/client_secret_<really_long>.apps.googleusercontent.com.json ~/.datasheets/client_secrets.json
Congratulations! You’re now set up with client auth and can start using the library! If you only plan to use datasheets on your local machine then you’re done, but if you might need service account access then read on!
OAuth Service Account Access¶
OAuth service accounts are pseudo-users that have their own email address. Documents shared with a service account will be accessible by the service (as a side note, this is how gspread works). This approach is ideal for ETL as a user doesn’t have to manually authorize access each time. However, a major weakness to this method is that anyone who gets hold of the credentials we are about to generate will be able to access all documents shared with this account, which is a weakness that the OAuth Client ID access does not share. With that in mind, the credentials we are about to generate should be kept secure.
To set up service account access:
Click on ‘Create Credentials’ again, this time choosing ‘Service account key’.
Select ‘New service account’, under role select Project -> Browser, and key type ‘JSON’, and click ‘Create’. The file will automatically be downloaded.
As before, save the file to
~/.datasheets/
, this time naming the fileservice_key.json
. Again, you can do this in Finder, but if you’d prefer terminal follow the commands below, using your service_key file’s original name:mv ~/Downloads/datasheets-auth-1e8be8d27209.json ~/.datasheets/service_key.json
As before, if you have a preferred location for this file you can instead place it there and specify
the path to the file to datasheets with the envvar $DATASHEETS_SERVICE_PATH
.
To use datasheets with your service account, create your client with datasheets.Client(service=True)
.
Congratulations! You’re all set up! Now just pip install datasheets
and away you go!
Comparison To gspread¶
gspread is a popular library for reading data from Google Sheets, and it was a big inspiration for this project. datasheets attempts to improve on gspread by:
- Supporting uploading of pandas DataFrames into Google Sheets.
- Ensuring that data pulled from Google Sheets keeps the data type it had within Google Sheets, e.g. datetimes will come in as datetimes, numbers as numbers, etc. Within gspread, non-numbers are generally all converted to strings.
- Allowing users to authenticate with their own Google account, meaning there is no need to create a service account and share all your files with it (though you can still do both of those things). Service accounts can be a security liability (as described below under “OAuth Service Account Access”); being able to use OAuth Client ID access diminishes that concern.
- Providing a number of additional tools for interacting with Google Sheets: format them, add/remove rows and columns, create or delete tabs and workbooks, share or unshare a workbook with users, etc. See below for more details.
- Using more modern, Google-maintained tools (e.g. Google’s google-api-python-client and oauth2client libraries) as opposed to parsing XML feeds.
API Reference¶
An Index of all functionality also exists.
Client class¶
-
class
datasheets.
Client
(service=False, storage=True, user_agent='Python datasheets library')¶ -
__init__
(service=False, storage=True, user_agent='Python datasheets library')¶ Create an authenticated client for interacting with Google Drive and Google Sheets
Parameters: - service (bool) –
Whether to authenticate as a user or as a service account. If service=False, you will be prompted to authorize this instance to access the Google Drive attached to one of your Gmail accounts.
Service-based authorization proceeds using the JSON file located at
$DATASHEETS_SERVICE_PATH
(default:~/.datasheets/service_key.json
).User-based authorization proceeds using the client secrets stored at
$DATASHEETS_SECRETS_PATH
(default:~/.datasheets/client_secrets.json
). Successful authentication by this method creates a set of credentials. By default these credentials are stored at$DATASHEETS_CREDENTIALS_PATH
(default:~/.datasheets/client_credentials.json
), though storage of these credentials can be disabled with storage=False. - storage (bool) –
Whether to use authorized credentials stored at
$DATASHEETS_CREDENTIALS_PATH
(or the default of~/.datasheets/client_credentials.json
). If credentials have not been stored yet or are invalid, whether to store newly obtained credentials at this location.If False, authorization will be requested every time a new Client instance is created. This mode deliberately does not store credentials to disk in order to allow for use of the library in multi-user environments.
- user_agent (str) – The user agent tied to new credentials, if new credentials are required. This is primarily metadata, and thus unless you have a reason to change this the default value is probably fine.
- service (bool) –
-
create_workbook
(filename, folders=())¶ Create a blank workbook with the specific filename
Parameters: Returns: An instance of the newly created workbook
Return type:
-
delete_workbook
(filename=None, file_id=None)¶ Delete a workbook from Google Drive
Either filename (i.e. title) or file_id should be provided. Providing file_id is preferred as it is more precise.
Parameters: Returns: None
-
fetch_folders
(only_mine=False)¶ Fetch all folders shared with this account
Parameters: only_mine (bool) – If True, limit results to only those folders owned by this user Returns: One row per folder listing folder name, ID, most recent modified time, and webview link to the folder Return type: pandas.DataFrame
-
fetch_workbook
(filename=None, file_id=None)¶ Fetch a workbook
Either filename (i.e. title) or file_id should be provided. Providing file_id is preferred as it is more precise.
Parameters: Returns: An instance of the requested workbook
Return type:
-
fetch_workbooks_info
(folder=None)¶ Fetch information on all workbooks shared with this account
Parameters: folder (str) – An optional folder name to limit the results to Returns: One row per workbook listing workbook name, ID, most recent modified time, and webview link to the workbook Return type: pandas.DataFrame
-
Workbook class¶
-
class
datasheets.
Workbook
(filename, file_id, client, drive_svc, sheets_svc)¶ -
__init__
(filename, file_id, client, drive_svc, sheets_svc)¶ Create a datasheets.Workbook instance of an existing Google Sheets doc
This class in not intended to be directly instantiated; it is created by datasheets.Client.fetch_workbook().
Parameters: - filename (str) – The name of the workbook
- file_id (str) – The Google Sheets-assigned ID for the file
- client (datasheets.Client) – The client instance that instantiated this workbook
- drive_svc (googleapiclient.discovery.Resource) – An instance of Google Drive
- sheets_svc (googleapiclient.discovery.Resource) – An instance of Google Sheets
-
batch_update
(body)¶ Apply updates to a workbook or tab using Google Sheets’ spreadsheets.batchUpdate method
Parameters: body (list) – A list of requests, with each request provided as a dict The Google Sheets batch update method is a flexible method exposed by the Google Sheets API that permits effectively all functionality allowed for by the Google Sheets.
For more details on how to use this method, see the following:
Explanation of spreadsheets.batchUpdate method:
List of available request types and the parameters they take:
Example
The following is an example request that would perform only one update operation (the ‘repeatCell’ operation):
request_body = { 'repeatCell': { 'range': { 'sheetId': self.tab_id, 'startRowIndex': 0, 'endRowIndex': self.nrows }, 'cell': { 'userEnteredFormat': { 'horizontalAlignment': horizontal, 'verticalAlignment': vertical, } }, 'fields': 'userEnteredFormat(horizontalAlignment,verticalAlignment)' } } body = {'requests': [request_body]}
-
client
¶ Property for the client instance that instantiated this workbook
-
create_tab
(tabname, nrows=1000, ncols=26)¶ Create a new tab in the given workbook
Parameters: Returns: An instance of the newly created tab
Return type:
-
delete_tab
(tabname)¶ Delete a tab with the given name from the current workbook
Parameters: tabname (str) – The name of the tab to delete Returns: None
-
fetch_permissions
()¶ Fetch information on who is shared on this workbook and their permission level
Returns: One row per email address shared, including the permission level that that email has been granted Return type: pandas.DataFrame
-
fetch_tab
(tabname)¶ Return a datasheets.Tab instance of the given tab associated with this workbook
Parameters: tabname (str) – The name of the tab to fetch Returns: An instance of the requested tab Return type: datasheets.Tab
-
fetch_tab_names
()¶ Show the names of the tabs within the workbook, returned as a pandas.DataFrame.
Returns: One row per tabname within the workbook Return type: pandas.DataFrame
Share this workbook with someone.
Parameters: - email (str) – The email address to share the workbook with
- role (str) – The type of permission to grant. Values must be one of ‘owner’, ‘writer’, or ‘reader’
- notify (bool) – If True, send an email notifying the recipient of their granted permission. These notification emails are the same as what Google sends when a document is shared through Google Drive
- message (str) – If notify is True, the message to send with the email notification
Returns: None
Unshare this workbook with someone.
Parameters: email (str) – The email address that will be unshared Returns: None
-
Tab class¶
-
class
datasheets.
Tab
(tabname, workbook, drive_svc, sheets_svc)¶ -
__init__
(tabname, workbook, drive_svc, sheets_svc)¶ Create a datasheets.Tab instance of an existing Google Sheets tab.
This class in not intended to be directly instantiated; it is created by datasheets.Workbook.fetch_tab().
Parameters: - tabname (str) – The name of the tab
- workbook (datasheets.Workbook) – The workbook instance that instantiated this tab
- drive_svc (googleapiclient.discovery.Resource) – An instance of Google Drive
- sheets_svc (googleapiclient.discovery.Resource) – An instance of Google Sheets
-
add_columns
(n)¶ Add n columns to the given tab
Parameters: n (int) – The number of columns to add Returns: None
-
add_rows
(n)¶ Add n rows to the given tab
Parameters: n (int) – The number of rows to add Returns: None
-
align_cells
(horizontal='LEFT', vertical='MIDDLE')¶ Align all cells in the tab
Parameters: Returns: None
-
alter_dimensions
(nrows=None, ncols=None)¶ Alter the dimensions of the current tab.
If either dimension is left to None, that dimension will not be altered. Note that it is possible to set nrows or ncols to smaller than the current tab dimensions, in which case that data will be eliminated.
Parameters: Returns: None
-
append_data
(data, index=True, autoformat=True)¶ Append data to the existing data in this tab.
If the new data exceeds the tab’s current dimensions the tab will be resized to accommodate it. Data headers will not be included among the appended data as they are assumed to already be among the existing tab data.
If the dimensions of data are larger than the tab’s current dimensions, the tab will automatically be resized to fit it.
Parameters: - data (pandas.DataFrame or dict or list) – The data to be uploaded, formatted as a pandas.DataFrame, a dict of lists, or a list of lists
- index (bool) – If data is a pandas.DataFrame, whether to upload the index as well
Returns: None
-
autoformat
(n_header_rows)¶ Apply default stylings to the tab
This will apply the following stylings to the tab:
- Header rows will be formatted to a dark gray background and off-white text
- Font for all cells will be set to size 10 Proxima Nova
- Cells will be horizontally left-aligned and vertically middle-aligned
- Columns will be resized to display their largest entry
- Empty columns and rows will be trimmed from the tab
Parameters: n_header_rows (int) – The number of header rows (i.e. row of labels / metadata) Returns: None
-
autosize_columns
()¶ Resize the widths of all columns in the tab to fit their data
Returns: None
-
clear_data
()¶ Clear all data from the tab while leaving formatting intact
Returns: None
-
fetch_data
(headers=True, fmt='df')¶ Retrieve the data within this tab.
Efforts are taken to ensure that returned rows are always the same length. If headers=True, this length will be equal to the length of the headers. If headers=False, this length will be equal to the longest row.
In either case, shorter rows will be padded with Nones and longer rows will be truncated (i.e. if there are 3 headers then all rows will have 3 entries regardless of the amount of populated cells they have).
Parameters: Returns: When fmt=’df’ –> pandas.DataFrame
When fmt=’dict’ –> list of dicts, e.g.:
[{header1: row1cell1, header2: row1cell2}, {header1: row2cell1, header2: row2cell2}, ...]
When fmt=’list’ –> tuple of header names, list of lists with row data, e.g.:
([header1, header2, ...], [[row1cell1, row1cell2, ...], [row2cell1, row2cell2, ...], ...])
-
format_font
(font='Proxima Nova', size=10)¶ Set the font and size for all cells in the tab
Parameters: Returns: None
-
format_headers
(nrows)¶ Format the first n rows of a tab.
The following stylings will be applied to these rows:
- Background will be set to dark gray with off-white text
- Font will be set to size 10 Proxima Nova
- Text will be horizontally left-aligned and vertically middle-aligned
- Rows will be made “frozen” so that when the user scrolls these rows stay visible
Parameters: nrows (int) – The number of rows of headers in the tab Returns: None
-
insert_data
(data, index=True, autoformat=True)¶ Overwrite all data in this tab with the provided data.
All existing data in the tab will be removed, even if it might not have been overwritten (for example, if there is 4x2 data already in the tab and only 2x2 data is being inserted).
If the dimensions of data are larger than the tab’s current dimensions, the tab will automatically be resized to fit it.
Parameters: - data (pandas.DataFrame or dict or list) – The data to be uploaded, formatted as a pandas.DataFrame, a dict of lists, or a list of lists
- index (bool) – If data is a pandas.DataFrame, whether to upload the index as well
Returns: None
-
ncols
¶ Property for the number (int) of columns in the tab
-
nrows
¶ Property for the number (int) of rows in the tab
-
tab_id
¶ Property that gives the ID for the tab
-
workbook
¶ Property for the workbook instance that this tab belongs to
-
Helpers¶
-
datasheets.
create_tab_in_existing_workbook
(filename, tabname, file_id=None)¶ Create a new tab in an existing workbook and return an instance of that tab
Either filename (i.e. title) or file_id should be provided. Providing file_id is preferred as it is more precise.
Parameters: Returns: An instance of the newly created tab
Return type:
-
datasheets.
create_tab_in_new_workbook
(filename, tabname, emails=(), role='reader', notify=True, message=None)¶ Create a new tab in a new workbook and return an instance of that tab
Parameters: - filename (str) – The name of the workbook to be created
- tabname (str) – The name of the tab to be created
- emails (str or tuple) – The email address(es) to grant the permission to. This may be one address in string form or a series of addresses in tuple form
- role (str or tuple) – The type of permission(s) to grant. This can be either a tuple of the same size as emails or a single value, in which case all emails are granted that permission level. Values must be one of ‘owner’, ‘writer’, or ‘reader’
- notify (bool) – If True, send an email notifying the recipient(s) of their granted permissions. These notification emails are the same as what Google sends when a document is shared through Google Drive
- message (str) – If notify is True, the message to send with the email notification
Returns: An instance of the newly created tab
Return type:
-
datasheets.helpers.
convert_cell_index_to_label
(row, col)¶ Convert two cell indexes to a string address
Parameters: Note that Google Sheets starts both the row and col indexes at 1.
Example
>>> sheets.convert_cell_index_to_label(1, 1) A1 >>> sheets.convert_cell_index_to_label(10, 40) BH10
Returns: The cell reference as an address (e.g. ‘B6’) Return type: str
-
datasheets.helpers.
convert_cell_label_to_index
(label)¶ Convert a cell label in string form into one based cell indexes of the form (row, col).
Parameters: label (str) – The cell label in string form Note that Google Sheets starts both the row and col indexes at 1.
Example
>>> sheets.convert_cell_label_to_index('A1') (1, 1) >>> sheets.convert_cell_label_to_index('BH10') (10, 40)
Returns: The cell reference in (row_int, col_int) form Return type: tuple
Exceptions¶
-
exception
datasheets.exceptions.
DatasheetsException
¶ Base Exception for all other datasheets exceptions
This is intended to make catching exceptions from this library easier.
-
exception
datasheets.exceptions.
FolderNotFound
¶ Attempting to open non-existent or inaccessible folder
-
exception
datasheets.exceptions.
MultipleWorkbooksFound
¶ Multiple workbooks found for the given filename
-
exception
datasheets.exceptions.
PermissionNotFound
¶ Trying to retrieve non-existent permission for workbook
-
exception
datasheets.exceptions.
TabNotFound
¶ Trying to open non-existent tab
-
exception
datasheets.exceptions.
WorkbookNotFound
¶ Trying to open non-existent or inaccessible workbook
Developing / Testing¶
Getting Set Up¶
First, get your Python environment set up:
mkvirtualenv datasheets
pip install -e . -r requirements-dev.txt
Various testing functionality exists:
make test
- Run tests for both Python 2 and 3pytest
- Run tests for whichever Python version is in your virtualenvmake coverage
- Check code coverage
Manual tests also exist in the tests/manual_testing.ipynb
Jupyter Notebook. To run the manual
tests, install Jupyter Notebook (pip install jupyter notebook
), then run jupyter notebook
,
open the file in the browser, and execute each cell.
Releasing A New Version¶
If you make a PR that gets merged into master, a new version of datasheets can be created as follows.
Increment the
__version__
in thedatasheets/__init__.py
file and commit that change.Push a new git tag to the repo by doing:
- Write the tag message in a dummy file called
tag_message
. We do this to allow multi-line tag messages git tag x.x.x -F tag_message
git push --tags origin master
- Write the tag message in a dummy file called
Run
make release_pypitest
to test that you can release to pypi.Run
make release_pypi
to actually push the release to pypi.