chat-archive: Easy to use offline chat archive

Welcome to the documentation of chat-archive version 4.0.2! The following sections are available:

User documentation

The readme is the best place to start reading, it’s targeted at all users and documents the command line interface:

chat-archive: Easy to use offline chat archive

The Python program chat-archive provides a local archive of chat messages that can be viewed and searched on the command line. Supported chat services include Google Talk, Google Hangouts, Slack and Telegram. The program was developed on Linux and currently assumes a UNIX command line environment, although this is not fundamental to the program’s design (for example I could imagine someone building a GUI or web interface using the Python API).

When you add a new account the initial synchronization will download your full conversation history from the chat service in question, this can take quite a while. Later synchronization runs will be much quicker because only updates (new messages and conversations) are downloaded.

Chat messages are downloaded as plain text and when possible also with formatting (encoded as HTML). When viewing chat messages on the terminal the formatted text will be shown.

Python 3.5+ is required due to the asynchronous nature of some of the backends.

Status

This is very young software, developed in a couple of sprints in the summer of 2018, so it’s bound to be full of bugs! The fact that it doesn’t have a test suite doesn’t help. However since creating this program I’ve started using it on a daily basis, so I may very well be the first one to run into most if not all bugs 😇.

There’s a lot of implementation details in the code base that I’m not proud of and there’s a ton of features that I would like to add, for example right now the command line is still rather bare bones (minimal). I’ve decided to nevertheless publish what I have right now, because in its current state this project is already very useful for me, so it might be useful to others.

I consider the first release to be representative of the functional goals I had in mind when I set out to build this, but I’d love to find the time to refactor the code base once or twice more 😋. Before publishing the first release I had already gone through three or four complete rewrites and each of those rewrites improved the quality of the code, yet I’m still not fully satisfied… Oh well, at least it seems to work 😉.

Installation

The chat-archive package is available on PyPI which means installation should be as simple as:

$ pip3 install chat-archive

Make sure you’re using Python 3.5+ because this is required by dependencies of the chat-archive program.

There’s actually a multitude of ways to install Python packages (e.g. the per user site-packages directory, virtual environments or just installing system wide) and I have no intention of getting into that discussion here, so if this intimidates you then read up on your options before returning to these instructions 😉.

Usage

The command line interface is documented below. For more details about the Python API please refer to the API documentation available on Read the Docs.

Command line

Usage: chat-archive [OPTIONS] [COMMAND]

Easy to use offline chat archive that can gather chat message history from Google Talk, Google Hangouts, Slack and Telegram.

Supported commands:

  • The ‘sync’ command downloads new chat messages from supported chat services and stores them in the local archive (an SQLite database).
  • The ‘search’ command searches the chat messages in the local archive for the given keyword(s) and lists matching messages.
  • The ‘list’ command lists all messages in the local archive.
  • The ‘stats’ command shows statistics about the local archive.
  • The ‘unknown’ command searches for conversations that contain messages from an unknown sender and allows you to enter the name of a new contact to associate with all of the messages from an unknown sender. Conversations involving multiple unknown sender are not supported.

Supported options:

Option Description
-C, --context=COUNT Print COUNT messages of output context during ‘chat-archive search’. This works similarly to ‘grep -C’. The default value of COUNT is 3.
-f, --force Retry synchronization of conversations where errors were previously encountered. This option is currently only relevant to the Google Hangouts backend, because I kept getting server errors when synchronizing a few specific conversations and I didn’t want to keep seeing each of those errors during every synchronization run :-).
-c, --color=CHOICE, --colour=CHOICE

Specify whether ANSI escape sequences for text and background colors and text styles are to be used or not, depending on the value of CHOICE:

  • The values ‘always’, ‘true’, ‘yes’ and ‘1’ enable colors.
  • The values ‘never’, ‘false’, ‘no’ and ‘0’ disable colors.
  • When the value is ‘auto’ (this is the default) then colors will only be enabled when an interactive terminal is detected.
-l, --log-file=LOGFILE Save logs at DEBUG verbosity to the filename given by LOGFILE. This option was added to make it easy to capture the log output of an initial synchronization that will be downloading thousands of messages.
-p, --profile=FILENAME Enable profiling of the chat-archive application to make it possible to analyze performance problems. Python profiling data will be saved to FILENAME every time database changes are committed (making it possible to inspect the profile while the program is still running).
-v, --verbose Increase logging verbosity (can be repeated).
-q, --quiet Decrease logging verbosity (can be repeated).
-h, --help Show this message and exit.
The ‘sync’ command

The command chat-archive sync downloads new chat messages using the configured backends and stores the messages in the local SQLite database. Positional arguments can be used to synchronize specific backends or accounts. For example I have two Telegram accounts, a personal account and a work account. The following command will synchronize both of these accounts:

$ chat-archive sync telegram

When I’m only interested in a specific account I can instead do this:

$ chat-archive sync telegram:personal

You can make this as complex as you want:

$ chat-archive sync hangouts slack:work telegram:personal

The command above will synchronize all configured Google Hangouts accounts, the Slack work account and the Telegram personal account. The following table shows the backend names you can use like this:

Backend name Chat service
gtalk Google Talk
hangouts Google Hangouts
slack Slack
telegram Telegram
The ‘search’ command

The command chat-archive search performs a keyword search through the chat messages in the local SQLite database and renders the search results on the terminal. Keywords are provided as positional arguments to the search command and trigger a case insensitive AND search through the following message metadata:

  • The name of the backend (see the table above).
  • The name of the account (default or a user defined name).
  • The name of the conversation (relevant for group conversations).
  • The full name of the contact that sent the message.
  • The email address of the contact that sent the message.
  • The timestamp of the message. Any prefix of the date format YYYY-MM-DD HH:MM:SS should work, judging by the date/time searches that I’ve tried so far. So for example the keyword 2018 will match all messages from that year, 2018-08 will match all messages in a specific month, etc.
  • The text of the message. The plain text chat message as well as the HTML formatted chat message (when available) are searched, this enables searching for semantically meaningful HTML data like hyperlink targets.

The search results reported on the terminal include surrounding chat messages from the matching conversations, to provide additional context. You can control how many surrounding chat messages are rendered using the -C, --context command line option, the value 0 can be used to omit the context.

The ‘list’ command

The command chat-archive list renders a listing of all chat messages in the database on the terminal.

Due to the gathering of context the chat-archive search command can be rather slow and this is why I added the chat-archive list command early in the development of the project (it’s faster because it doesn’t have to gather context). Since then I’ve collected 226.941 chat messages, completely negating the usefulness of the chat-archive list command 😇.

In any case this can be considered a very simple form of export functionality, so I’ve decided to keep the chat-archive list command for now, despite its limited usefulness once one actively starts using the chat-archive program.

The ‘stats’ command

The command chat-archive stats reports some statistics about the contents of the local SQLite database. Here’s what that looks like for me at the time of writing:

Statistics about ~/.local/share/chat-archive/database.sqlite3:

 - Number of contacts: 284
 - Number of conversations: 5803
 - Number of messages: 226941
 - Database file size: 90.81 MB
 - Size of 226941 plain text chat messages: 18.7 MB
 - Size of 13409 HTML formatted chat messages: 4.25 MB
The ‘unknown’ command

The first time I synchronized the thousands of chat messages in my Google Hangouts account I was very disappointed to find out that all metadata about contacts whose accounts had since been deleted was lost (no names, no email addresses, nothing).

This is why I added the chat-archive unknown command. It searches the local database for private conversations that contain messages from an unknown sender and prompts you to enter a name for the contact. When you enter a (nonempty) name a new contact is created and the messages in the conversation which have no sender are associated to the new contact.

Weirdly enough the Google Mail archive of chat messages was able to show me names for most of the contacts for which the Google Hangouts API no longer reported any useful information, this is how I was able to (manually) reconstruct this bit of history.

If the Google Mail archive had not provided me with this information I still would have been able to reconstruct the senders of 90% of these conversations simply by the fact that quite a few conversations start with “Hi $name” and I still have “client side chat archive backups” (Pidgin) from 2011-2015.

Configuration files

If you’re going to be synchronizing your chat message history frequently you can define credentials for the chat services that you are interested in using a configuration file.

Configuration files are text files in the subset of ini syntax supported by Python’s configparser module. They can be located in the following places:

Directory Main configuration file Modular configuration files
/etc /etc/chat-archive.ini /etc/chat-archive.d/*.ini
~ ~/.chat-archive.ini ~/.chat-archive.d/*.ini
~/.config ~/.config/chat-archive.ini ~/.config/chat-archive.d/*.ini

The available configuration files are loaded in the order given above, so that user specific configuration files override system wide configuration files.

The special configuration file section chat-archive defines general options. Right now only the operator-name option is supported here. All other sections are specific to a chat account and encode the name of the backend and the name of the account in the name of the section by delimiting the two values with a colon. Here’s an example based on my configuration, that shows the supported options:

[chat-archive]
operator-name = ...

[hangouts:work]
email-address = ...
password = ...
# Alternatively:
password-name = ...

[slack:work]
api-token = ...
# Alternatively:
api-token-name = ...

[gtalk:work]
email = ...
password = ...
# Alternatively:
password-name = ...

[telegram:personal]
api-hash = ...
api-id = ...
phone-number = ...

[telegram:work]
api-hash = ...
api-id = ...
phone-number = ...
# Alternatively:
api-hash-name = ...
api-id-name = ...

When an account is configured but the configuration doesn’t define a required secret then you will be prompted to provide that secret every time you run the chat-archive sync command.

The values of the api-token-name, password-name, api-hash-name and api-id-name options identify secrets in ~/.password-store to use, this provides an alternative somewhere in between the following two extremes:

  • Always typing your secrets interactively (because you don’t want them to be stored in the chat-archive configuration file, which is understandable from a security perspective of security).
  • Storing your secrets directly in the chat-archive configuration files (so you don’t have to type secrets interactively) thereby exposing them to all software running on your computer.

Because pass can use gpg-agent you only have to type a single master password to unlock the secrets required to synchronize any number of chat accounts.

The local database

The chat-archive program uses an SQLite database to store the chat messages that it collects. Because the whole point of the program is to safeguard the long term archival of chat messages, SQLAlchemy and Alembic are used to support database schema migrations. This is intended to ensure a reliable upgrade path for future enhancements without data loss.

There’s one significant exception I can think of: The current version of the chat-archive program doesn’t synchronize images and other multimedia files, only text messages are stored in the local database. If support for images is added in a later release (I’m not committing to this, but I am considering it) and collecting these is important to you then you may have to rebuild your database if and when this support is added.

You can change the location of the SQLite database and other datafiles by setting the environment variable $CHAT_ARCHIVE_DIRECTORY. Making a backup of your chat archive is as simple as saving a copy of the database file ~/.local/share/chat-archive/database.sqlite3 to another storage medium. Please keep in mind that this database has the potential to contain a lot of sensitive data, so I strongly advise you to use disk encryption.

Supported chat services

The following backends are currently available:

Chat service Description
Google Talk At one time this was the primary chat service of Google. It was based on (or at least cooperated well with) XMPP. My personal chat archive of Google Talk messages ends on 2013-12-12.
Google Hangouts The successor to Google Talk. Interestingly enough my personal chat archive of Google Hangouts messages starts on 2013-10-30 (what’s interesting to me is the overlap with the date above).
Slack Love it or hate it, when all of your colleagues are using it you can’t really get around it. Actually now that I write it down like that I can’t help but think of WhatsApp (where the “peer pressure” comes from family instead of colleagues).
Telegram A popular alternative to WhatsApp from Russia, without the Facebook baggage 😇 (which is not to say that the company behind Telegram can’t be just as evil).

In the future more backends may be added:

  • I’ve been contemplating scraping “WhatsApp Web” using something like Selenium. It would get ugly and nasty, the resulting backend would be fragile at best, but having those messages available might just be worth it…
  • I’m considering writing a chat log parser for the HTML chat logs that Pidgin generated ten years ago (circa 2008) because I have megabytes of such chat logs stored in backups 🙂.

History

The fragmented nature of digital communication, where messages come to you via numerous channels (including multiple chat services), has bothered me for years now. Finding things back can actually become a challenge 😇. Tangentially related is the realization that these chat services come and go, taking with them years of chat history, lost forever. I’m looking at you Google 😉.

Given that I am a programmer by trade and heart, It’s been itching for several years now to try and solve both of these problems at the same time by creating a computer program that downloads and stores the chat message history of multiple chat services into a single local database, available for searching and trivially easy to back up.

For what it’s worth I didn’t start out with the goal of “full fidelity” chat history backup including images and other multimedia, although I may eventually decide to implement it anyway. What I initially set out to build was a local, searchable database of textual chat messages collected from multiple chat services, with an easy way to add support for new chat services.

Contact

The latest version of chat-archive is available on PyPI and GitHub. The documentation is hosted on Read the Docs and includes a changelog. For bug reports please create an issue on GitHub. If you have questions, suggestions, etc. feel free to send me an e-mail at peter@peterodding.com.

License

This software is licensed under the MIT license.

© 2018 Peter Odding.

Here’s a quick overview of the licenses of the dependencies:

Dependency License
Alembic MIT license
emoji BSD license
hangups MIT license
Slacker Apache Software License
SQLAlchemy MIT license
Telethon MIT license

Shortly before publishing this project I got worried that I had included a GPL dependency which (if I understand correctly) would require me to publish under GPL as well, even though I’ve been consistently publishing my open source projects under the MIT license since 2010.

After assembling the table above I can confidently say that this is not the case 😇. The dependencies that are not listed in the table above are projects of mine, all of them published under the same MIT license as the chat-archive program (assuming I keep this up-to-date as new dependencies are added).

API documentation

The following API documentation is automatically generated from the source code:

API documentation

This documentation is based on the source code of version 4.0.2 of the chat-archive package. The following modules are available:

chat_archive

Python API for the chat-archive program.

chat_archive.DEFAULT_ACCOUNT_NAME = 'default'

The name of the default account (a string).

class chat_archive.ChatArchive(*args, **kw)[source]

Python API for the chat-archive program.

You can set the values of the data_directory, database_file and force properties by passing keyword arguments to the class initializer.

Here’s an overview of the ChatArchive class:

Superclass: SchemaManager
Public methods: commit_changes(), get_accounts_for_backend(), get_accounts_from_config(), get_accounts_from_database(), get_backend_name(), get_backends_and_accounts(), initialize_backend(), is_operator(), load_backend_module(), parse_account_expression(), search_messages() and synchronize()
Properties: alembic_directory, backends, config, config_loader, data_directory, database_file, declarative_base, force, import_stats, num_contacts, num_conversations, num_html_messages, num_messages and operator_name
alembic_directory

The pathname of the directory containing Alembic migration scripts (a string).

The value of this property is computed at runtime based on the value of __file__ inside of the chat_archive/__init__.py module.

backends[source]

A dictionary of available backends (names and dotted paths).

>>> from chat_archive import ChatArchive
>>> archive = ChatArchive()
>>> print(archive.backends)
{'gtalk': 'chat_archive.backends.gtalk',
 'hangouts': 'chat_archive.backends.hangouts',
 'slack': 'chat_archive.backends.slack',
 'telegram': 'chat_archive.backends.telegram'}

Note

The backends property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

config[source]

A dictionary with general user defined configuration options.

Note

The config property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

config_loader[source]

A ConfigLoader object that provides access to the configuration.

Configuration files are text files in the subset of ini syntax supported by Python’s configparser module. They can be located in the following places:

Directory Main configuration file Modular configuration files
/etc /etc/chat-archive.ini /etc/chat-archive.d/*.ini
~ ~/.chat-archive.ini ~/.chat-archive.d/*.ini
~/.config ~/.config/chat-archive.ini ~/.config/chat-archive.d/*.ini

The available configuration files are loaded in the order given above, so that user specific configuration files override system wide configuration files.

Note

The config_loader property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

declarative_base

The base class for declarative models defined using SQLAlchemy.

data_directory[source]

The pathname of the directory where data files are stored (a string).

The environment variable $CHAT_ARCHIVE_DIRECTORY can be used to set the value of this property. When the environment variable isn’t set the default value ~/.local/share/chat-archive is used (where ~ is expanded to the profile directory of the current user).

Note

The data_directory property is a custom_property. You can change the value of this property using normal attribute assignment syntax. This property’s value is computed once (the first time it is accessed) and the result is cached. To clear the cached value you can use del or delattr().

database_file[source]

The absolute pathname of the SQLite database file (a string).

This defaults to ~/.local/share/chat-archive/database.sqlite3 (with ~ expanded to the home directory of the current user) based on data_directory.

Note

The database_file property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

force[source]

Retry synchronization of conversations where errors were previously encountered (a boolean, defaults to False).

Note

The force property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

import_stats[source]

Statistics about objects imported by backends (a BackendStats object).

Note

The import_stats property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

num_contacts

The total number of chat contacts in the local archive (a number).

num_conversations

The total number of chat conversations in the local archive (a number).

num_html_messages

The total number of chat messages with HTML formatting in the local archive (a number).

num_messages

The total number of chat messages in the local archive (a number).

operator_name[source]

The full name of the person using the chat-archive program (a string or None).

The value of operator_name is used to address the operator of the chat-archive program in first person instead of third person. You can change the value in the configuration file:

[chat-archive]
operator-name = ...

The default value in case none has been specified in the configuration file is taken from /etc/passwd using get_full_name().

Note

The operator_name property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

commit_changes()[source]

Show import statistics when committing database changes to disk.

get_accounts_for_backend(backend_name)[source]

Select the configured and/or previously synchronized account names for the given backend.

get_accounts_from_database(backend_name)[source]

Get the names of the accounts that are already in the database for the given backend.

get_accounts_from_config(backend_name)[source]

Get the names of the accounts configured for the given backend in the configuration file.

get_backend_name(backend_name)[source]

Get a human friendly name for the given backend.

get_backends_and_accounts(*backends)[source]

Select backends and accounts to synchronize.

initialize_backend(backend_name, account_name)[source]

Load a chat archive backend module.

Parameters:
  • backend_name – The name of the backend (one of the strings ‘gtalk’, ‘hangouts’, ‘slack’ or ‘telegram’).
  • account_name – The name of the account (a string).
Returns:

A ChatArchiveBackend object.

Raises:

Exception when the backend doesn’t define a subclass of ChatArchiveBackend.

is_operator(contact)[source]

Check whether the full name of the given contact matches operator_name.

load_backend_module(backend_name)[source]

Load a chat archive backend module.

Parameters:backend_name – The name of the backend (one of the strings ‘gtalk’, ‘hangouts’, ‘slack’ or ‘telegram’).
Returns:The loaded module.
parse_account_expression(value)[source]

Parse a backend:account expression.

Parameters:value – The backend:account expression (a string).
Returns:A tuple with two values:
  1. The name of a backend (a string).
  2. The name of an account (a string, possibly empty).
search_messages(keywords)[source]

Search the chat messages in the local archive for the given keyword(s).

synchronize(*backends)[source]

Download new chat messages.

Parameters:backends – Any positional arguments limit the synchronization to backends whose name matches one of the strings provided as positional arguments.

If the name of a backend contains a colon the name is split into two:

  1. The backend name.
  2. An account name.

This way one backend can synchronize multiple named accounts into the same local database without causing confusion during synchronization about which conversations, contacts and messages belong to which account.

class chat_archive.BackendStats[source]

Statistics about chat message synchronization backends.

__init__()[source]

Initialize a BackendStats object.

__enter__()[source]

Alias for push().

__exit__(exc_type=None, exc_value=None, traceback=None)[source]

Alias for pop().

__getattr__(name)[source]

Get the value of a counter from the current scope.

__setattr__(name, value)[source]

Set the value of a counter in the current scope.

pop()[source]

Remove the inner scope and merge its counters into the outer scope.

push()[source]

Create a new inner scope with all counters reset to zero.

show()[source]

Show statistics about imported conversations, messages, contacts, etc.

scope

The current scope (a collections.defaultdict object).

chat_archive.backends

Namespace for chat archive backends.

The following chat archive backends have been implemented so far:

class chat_archive.backends.ChatArchiveBackend(**kw)[source]

Abstract base class for chat-archive backends.

When you initialize a ChatArchiveBackend object you are required to provide values for the account_name, archive, backend_name and stats properties. You can set the values of the account_name, archive, backend_name and stats properties by passing keyword arguments to the class initializer.

Here’s an overview of the ChatArchiveBackend class:

Superclass: PropertyManager
Public methods: find_contact_by_attributes(), find_contact_by_email_address(), find_contact_by_external_id(), find_contact_by_telephone_number(), get_or_create_contact(), get_or_create_conversation(), get_or_create_email_address(), get_or_create_message(), get_or_create_object(), get_or_create_telephone_number(), have_message(), pre_process_text() and synchronize()
Properties: account, account_name, archive, backend_name, config, external_id_cache, redirect_stripper, session and stats
account[source]

The Account object corresponding to account_name and backend_name.

Note

The account property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

account_name[source]

The name of the chat account that is being synchronized (a string).

The value of account_name needs to be set by the caller and is used to “get or create” the account object on demand.

Note

The account_name property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named account_name (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

archive[source]

The ChatArchive that is using this backend.

Note

The archive property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named archive (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

backend_name[source]

The name of the chat archive backend (a short alphanumeric string).

The value of backend_name is used to “get or create” the account object on demand.

Note

The backend_name property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named backend_name (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

config[source]

The configuration options for this backend and account (a dictionary).

Note

The config property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

external_id_cache[source]

A dictionary mapping external IDs to Contact objects.

Note

The external_id_cache property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

redirect_stripper[source]

An RedirectStripper object.

Note

The redirect_stripper property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

session[source]

Shortcut for the session property of archive.

Note

The session property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

stats[source]

A BackendStats object.

Note

The stats property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named stats (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

find_contact_by_attributes(attributes)[source]

Find a contact based on their external ID, an email address or a telephone number.

Parameters:attributes

A dictionary with any of the following keys:

  • external_id (string value)
  • email_addresses (list of strings)
  • telephone_numbers (list of strings)
Returns:A Contact object or None.
find_contact_by_email_address(value)[source]

Find a contact based on their email address.

Parameters:value – An email address (a string).
Returns:A Contact object or None.
find_contact_by_external_id(external_id)[source]

Find a contact based on their ‘external ID’.

Parameters:external_id – The external ID (a string).
Returns:A Contact object or None.

This method uses external_id_cache to speed up lookup of contacts by their external ID.

find_contact_by_telephone_number(value)[source]

Find a contact based on their telephone number.

Parameters:value – A telephone number (a string).
Returns:A Contact object or None.
get_or_create_contact(**attributes)[source]

Get or create a contact object.

Parameters:attributes – The names and values of model attributes, used to find existing contacts and create new ones.
Returns:A Contact object.

This method serves three distinct purposes:

  1. Finding existing contacts by their ‘external ID’ or one of their email addresses or telephone numbers.
  2. Creating new contacts (based on the given attributes).
  3. Updating existing contacts (based on the given attributes).

Here’s an overview of supported attributes:

  • The external_id attribute (whose value is expected to be string).
  • The full_name attribute (whose value is expected to be string) is split into separate first_name and last_name attributes.
  • The attributes email_address and telephone_number (whose value is expected to be string) are converted to their plural forms email_addresses and telephone_numbers (a list of strings).
get_or_create_conversation(external_id, **attributes)[source]

Get or create a Conversation object.

Parameters:
  • external_id – The external ID of the conversation (a string).
  • attributes – Any optional attributes to set when creating a new conversation.
Returns:

Refer to get_or_create_object().

get_or_create_message(conversation, **attributes)[source]

Get or create a Message object.

Parameters:
  • conversation – The Conversation in which the message originated.
  • attributes – Any optional attributes to set when creating a new message.
Returns:

Refer to get_or_create_object().

get_or_create_email_address(email_address)[source]

Get or create an EmailAddress object.

Parameters:email_address – The email address (a string).
Returns:An EmailAddress object.
get_or_create_object(model, required, optional=None)[source]

Find an existing object in the local database or create a new object.

Parameters:
  • model – The model to query.
  • required – A dictionary with the key/value pairs that should be used to search for an existing object.
  • optional – Any optional attributes to set when creating a new object.
Returns:

A tuple with two values:

  1. True if the object was created, False if it already existed.
  2. The object (an instance of model).

get_or_create_telephone_number(telephone_number)[source]

Get or create a TelephoneNumber object.

Parameters:telephone_number – The telephone number (a string containing a number).
Returns:A TelephoneNumber object.
have_message(conversation, external_id)[source]

Check if a message exists in the local database.

Parameters:
  • conversation – The Conversation that contains the message.
  • external_id – The unique id of the message (a string).
Returns:

True when the message exists, False if it doesn’t.

pre_process_text(attributes)[source]

Pre-process the text and HTML of a chat message.

Parameters:attributes – A dictionary with Message attributes.

This method works as follows:

  1. The text is pre-processed using strip_redirects().
  2. The html is pre-processed using RedirectStripper.
  3. When the resulting HTML exactly equals the plain text chat message, the html key in attributes is removed.
synchronize()[source]

This instance method must be implemented by subclasses.

chat_archive.backends.gtalk

Synchronization logic for the Google Talk backend of the chat-archive program.

The Google Talk backend uses the IMAP protocol to discover and download the messages available in the chats_folder of your Google Mail account. The following requirements need to be met in order to use this backend:

  • You need to enable IMAP access to your Google Mail account.
  • You may need to specifically enable IMAP access to the chats_folder (this turned out to be necessary for me).

Before developing this module in June 2018 I had never implemented any IMAP automation [1] so I wasn’t that familiar with the protocol and I didn’t know about message UIDs. The Unique ID in IMAP protocol blog post provided me with some useful details about the semantics of message UIDs.

This backend assumes and requires that the Google Mail servers provide message UIDs that are stable across sessions (this enables discovery of new messages). My testing implies that this is the case, because it seems to work fine! :-)

[1]Despite operating my own IMAP server for the past ten years, so I was already familiar with IMAP from the perspective of a user as well as server administrator.
chat_archive.backends.gtalk.FRIENDLY_NAME = 'Google Talk'

A user friendly name for the chat service supported by this backend (a string).

chat_archive.backends.gtalk.NAMESPACED_TAG_PATTERN = re.compile('^{[^}]+}(\\S+)$')

Compiled regular expression to match XML tag names with a name space.

chat_archive.backends.gtalk.BOGUS_EMAIL_PATTERN = re.compile('^private-chat(-[0-9a-f]+)+@groupchat.google.com$', re.IGNORECASE)

Compiled regular expression to recognize private messages in group conversations.

class chat_archive.backends.gtalk.GoogleTalkBackend(**kw)[source]

The Google Talk backend for the chat-archive program.

This backend supports the following configuration options:

Option Description
chats-folder See chats_folder.
imap-server See imap_server.
email The email address used to sign in to your Google Mail account.
password-name The name of a password in ~/.password-store to use.
password See password.

If you set password-name then password doesn’t have to be set. If password nor password-name have been set then you will be prompted for your password every time you synchronize.

You can set the values of the chats_folder and imap_server properties by passing keyword arguments to the class initializer.

Here’s an overview of the GoogleTalkBackend class:

Superclass: ChatArchiveBackend
Public methods: check_response(), contact_from_header(), contact_from_jid(), contact_from_keywords(), extract_html(), extract_timestamp(), find_conversation(), find_uids_to_download(), find_uids_to_import(), get_email_body(), login_to_server(), parse_multipart_email(), parse_singlepart_email(), parse_xml(), select_chats_folder() and synchronize()
Properties: chats_folder, client, conversation_map, imap_server and password
chats_folder[source]

The folder that contains chat message archives (a string, defaults to ‘[Gmail]/Chats’).

Note

The chats_folder property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

client[source]

An IMAP client connection to imap_server.

Note

The client property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

conversation_map[source]

A mapping of conversations.

Note

The conversation_map property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

imap_server[source]

The domain name of the Google Mail IMAP server (a string, defaults to ‘imap.gmail.com’).

Note

The imap_server property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

password[source]

The password used to sign in to the Google Mail account (a string).

Note

The password property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

synchronize()[source]

Download RFC822 encoded Google Talk conversations using IMAP and import the embedded chat messages.

login_to_server()[source]

Log-in to the Google Mail account.

select_chats_folder()[source]

Select the IMAP folder with chat messages.

find_uids_to_download()[source]

Determine the UIDs of the email messages to be downloaded.

find_uids_to_import()[source]

Determine which email messages need to be imported.

get_email_body(uid)[source]

Get the body of an email from the local cache or the server.

parse_singlepart_email(email)[source]

Extract a chat message from a single-part email downloaded from chats_folder.

parse_multipart_email(email)[source]

Find the text/xml payload in an RFC 822 multi-part email message.

parse_xml(xml_body, conversation)[source]

Extract chat messages from the text/xml payload.

find_conversation(*participants)[source]

Find a conversation (without an external ID) that involves the given participants.

extract_timestamp(message_node)[source]

Extract a timestamp from a <message> node.

Parameters:message_node – A <message> node.
Returns:A datetime.datetime object.
extract_html(message_node)[source]

Try to extract HTML from a <message> node.

Parameters:message_node – A <message> node.
Returns:The extracted HTML (a string) or None.
contact_from_jid(value)[source]

Convert a Jabber ID to an email address and use that to find or create a contact.

contact_from_keywords(keywords)[source]

Try to find a unique contact based on the given keywords.

contact_from_header(value)[source]

Get or create a contact based on the From: or To: header of an email.

check_response(response, message, *args, **kw)[source]

Validate an IMAP server response.

class chat_archive.backends.gtalk.EmailMessageParser(**kw)[source]

Lazy evaluation of email.message_from_string().

When you initialize a EmailMessageParser object you are required to provide values for the raw_body and uid properties. You can set the values of the raw_body and uid properties by passing keyword arguments to the class initializer.

Here’s an overview of the EmailMessageParser class:

Superclass: PropertyManager
Properties: parsed_body, raw_body, timestamp and uid
parsed_body[source]

The result of email.message_from_string().

Note

The parsed_body property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

raw_body[source]

The raw message body of the email (a string).

Note

The raw_body property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named raw_body (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

timestamp[source]

Convert the Date: header of the email message to a datetime object.

Note

The timestamp property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

uid[source]

The UID of the email message.

Note

The uid property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named uid (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

class chat_archive.backends.gtalk.LazyXMLFormatter(node)[source]

Lazy evaluation of xml.etree.ElementTree.tostring().

__init__(node)[source]

Initialize a LazyXMLFormatter object.

Parameters:node – The XML node to render.
__bytes__()[source]

Convert the XML node to a byte string.

__str__()[source]

Convert the XML node to a string.

chat_archive.backends.hangouts

Synchronization logic for the Google Hangouts backend of the chat-archive program.

chat_archive.backends.hangouts.FRIENDLY_NAME = 'Google Hangouts'

A user friendly name for the chat service supported by this backend (a string).

class chat_archive.backends.hangouts.HangoutsBackend(**kw)[source]

The Google Hangouts backend for the chat-archive program.

This backend supports the following configuration options:

Option Description
email-address The email address used to sign in to your Google account.
password-name The name of a password in ~/.password-store to use.
password The password used to sign in to your Google account.

If you set password-name then password` doesn't have to be set. If ``password nor password-name have been set then you will be prompted for your password every time you synchronize.

You can set the values of the cookie_file and retry_count properties by passing keyword arguments to the class initializer.

Here’s an overview of the HangoutsBackend class:

Superclass: ChatArchiveBackend
Public methods: connect_then_sync(), download_all_contacts(), download_all_conversations(), download_all_messages(), download_conversation(), download_message_batch(), get_message_html(), handle_import_errors(), is_bogus_user(), perform_initial_sync() and synchronize()
Properties: bogus_user_ids, client, cookie_file and retry_count
bogus_user_ids[source]

A set of strings with ‘gaia_id’ values of “bogus” users.

Note

The bogus_user_ids property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

cookie_file[source]

The pathname of the *.json file with cached credentials (a string).

Note

The cookie_file property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

client[source]

The hangups client object.

Note

The client property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

retry_count[source]

The number of times that a batch of messages will be requested (a number, defaults to 5).

Note

The retry_count property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

synchronize()[source]

Download chat contacts and messages and store them in the local archive.

download_all_contacts(user_list)[source]

Download contact details from Google Hangouts.

get_message_html(event)[source]

Get the formatted text of a chat message as HTML.

is_bogus_user(user)[source]

Ignore default / unknown users made up by hangups.

connect_then_sync()[source]

Connect to the Hangouts service and start the synchronization.

download_all_conversations(conversation_list)[source]

Download conversations from Google Hangouts.

download_all_messages(conversation, conversation_in_db, event_id=None)[source]

Download the messages in a specific Hangouts conversation.

download_conversation(conversation)[source]

Download a single Google Hangouts conversation.

download_message_batch(conversation, event_id)[source]

Try to download a batch of messages (retrying according to retry_count).

handle_import_errors(conversation, conversation_in_db, event_id=None)[source]

Download messages in a conversation, handling synchronization errors.

perform_initial_sync(conversation, conversation_in_db)[source]

Perform the initial synchronization to the start of a conversation.

class chat_archive.backends.hangouts.GoogleAccountCredentials(**kw)[source]

Used to non-interactively provide Google Account credentials to hangups.

When you initialize a GoogleAccountCredentials object you are required to provide values for the email_address and password properties. You can set the values of the email_address and password properties by passing keyword arguments to the class initializer.

Here’s an overview of the GoogleAccountCredentials class:

Superclass: PropertyManager
Public methods: get_email(), get_password() and get_verification_code()
Properties: email_address and password
email_address[source]

The Google account email address (a string).

Note

The email_address property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named email_address (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

password[source]

The Google account password (a string).

Note

The password property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named password (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

get_email()[source]

Feed the configured email_address to hangups.

get_password()[source]

Feed the configured password to hangups.

get_verification_code()[source]

Prompt the operator for a verification code.

chat_archive.backends.slack

Synchronization logic for the Slack backend of the chat-archive program.

chat_archive.backends.slack.FRIENDLY_NAME = 'Slack'

A user friendly name for the chat service supported by this backend (a string).

class chat_archive.backends.slack.SlackBackend(**kw)[source]

Container for the Slack chat archive backend.

You can set the value of the is_limited property by passing a keyword argument to the class initializer.

Here’s an overview of the SlackBackend class:

Superclass: ChatArchiveBackend
Public methods: expand_reference_callback(), get_history(), import_messages(), synchronize(), synchronize_channels(), synchronize_direct_messages() and synchronize_users()
Properties: api_token, client, http_session, is_limited, mrkdwn_to_html and spinner
api_token[source]

The Slack API token (a string).

Note

The api_token property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

client[source]

A slacker.Slacker instance initialized with api_token and http_session.

Note

The client property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

is_limited[source]

Whether result sets have been limited due to the free plan.

Note

The is_limited property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

mrkdwn_to_html[source]

An HTMLConverter object.

Note

The mrkdwn_to_html property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

http_session[source]

A requests.Session object used for HTTP connection re-use.

Note

The http_session property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

spinner[source]

An interactive spinner to provide feedback to the user (because the Slack backend is slow).

Note

The spinner property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

synchronize()[source]

Download chat contacts and messages and store them in the local archive.

synchronize_users()[source]

Download information about the users in the organization on Slack.

synchronize_direct_messages()[source]

Download the latest direct messages from Slack.

synchronize_channels()[source]

Download messages from named channels.

import_messages(source, conversation_in_db)[source]

Import the history of the given Slack channel.

get_history(source, channel_id, latest=None, oldest=0, page_size=100)[source]

Get the history of the given Slack channel.

expand_reference_callback(external_id)[source]

Expand a @reference to a Slack user in a chat message with the name of that user.

class chat_archive.backends.slack.HTMLConverter(expand_reference_callback=None)[source]

Convert Slack chat messages from mrkdwn format to HTML.

__init__(expand_reference_callback=None)[source]

Initialize an HTMLConverter object.

__call__(text)[source]

Convert a Slack chat message to HTML.

Parameters:text – The text of a Slack message (a string).
Returns:The generated HTML (a string).
followed_by_alphanumeric(input, index, limit)[source]

Check if the given position is followed by an alphanumeric character.

parse_bold(input, index, length, output)[source]

Parse bold text.

parse_entity(input, index, length, output)[source]

Parse an HTML entity.

parse_italic(input, index, length, output)[source]

Parse _italic_ text.

parse_preformatted(input, index, length, output)[source]

Parse pre-formatted text.

parse_preformatted_body(input, index, length, output)[source]

Parse the body of a pre-formatted text fragment.

parse_reference(input, index, length, output)[source]

Parse a reference to a URL, user or channel.

parse_strike_through(input, index, length, output)[source]

Parse ~strike-through~ text.

parse_text(input, index, length, output)[source]

Parse inline text.

preceded_by_alphanumeric(input, index)[source]

Check if the given position is preceded by an alphanumeric character.

chat_archive.backends.telegram

Synchronization logic for the Telegram backend of the chat-archive program.

The use of this backend requires the user to register on my.telegram.org/apps to get an api_id and api_hash.

chat_archive.backends.telegram.FRIENDLY_NAME = 'Telegram'

A user friendly name for the chat service supported by this backend (a string).

class chat_archive.backends.telegram.TelegramBackend(**kw)[source]

Container for the Telegram chat archive backend.

When you initialize a TelegramBackend object you are required to provide values for the api_hash and api_id properties. You can set the values of the api_hash, api_id and session_file properties by passing keyword arguments to the class initializer.

Here’s an overview of the TelegramBackend class:

Superclass: ChatArchiveBackend
Public methods: connect_then_sync(), dialog_to_ignore(), download_messages(), is_duplicate_dialog(), is_group_conversation(), is_service_dialog(), perform_initial_sync(), recipient_to_contact(), sender_to_contact(), synchronize() and update_conversation()
Properties: api_hash, api_id, client and session_file
api_hash[source]

The API hash used to connect to the Telegram API (a string).

The value of this property can be configured as follows:

[telegram]
api-hash = ...

You can use the api-hash-name configuration file option to specify the name of a secret in ~/.password-store instead.

Note

The api_hash property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named api_hash (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

api_id[source]

The API ID used to connect to the Telegram API (an integer).

The value of this property can be configured as follows:

[telegram]
api-id = ...

You can use the api-id-name configuration file option to specify the name of a secret in ~/.password-store instead.

Note

The api_id property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named api_id (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

client[source]

A telethon.TelegramClient object constructed based on api_id,:attr:api_hash and session_file.

Note

The client property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

session_file[source]

The filename of the session file passed to telethon.TelegramClient.

Note

The session_file property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

synchronize()[source]

Download chat contacts and messages and store them in the local archive.

dialog_to_ignore(dialog)[source]

Check if this conversation should be ignored.

This method exists to exclude two types of conversations:

  • The conversation with the “Telegram” user, because I don’t consider the service messages in this conversation to be relevant to my chat archive.
  • Group conversations that are being synchronized as part of a different Telegram account.
is_duplicate_dialog(dialog)[source]

Check if the given dialog is being synchronized as part of a different Telegram account.

is_group_conversation(dialog)[source]

Determine whether the given dialog is a group conversation.

is_service_dialog(dialog)[source]

Check if the given dialog is the dialog with the “Telegram” user, containing service messages.

connect_then_sync()[source]

Connect to the Telegram API and synchronize the available conversations.

download_messages(dialog, conversation_in_db, min_id=0, max_id=0)[source]

Download messages in the given conversation.

perform_initial_sync(dialog, conversation_in_db)[source]

Start or resume the initial synchronization.

update_conversation(dialog, conversation_in_db)[source]

Download new messages in an existing conversation.

sender_to_contact(user)[source]

Create a contact in our local database for the given Telegram user.

recipient_to_contact(to_id)[source]

Create a contact in our local database for the given to_id value.

chat_archive.cli

Usage: chat-archive [OPTIONS] [COMMAND]

Easy to use offline chat archive that can gather chat message history from Google Talk, Google Hangouts, Slack and Telegram.

Supported commands:

  • The ‘sync’ command downloads new chat messages from supported chat services and stores them in the local archive (an SQLite database).
  • The ‘search’ command searches the chat messages in the local archive for the given keyword(s) and lists matching messages.
  • The ‘list’ command lists all messages in the local archive.
  • The ‘stats’ command shows statistics about the local archive.
  • The ‘unknown’ command searches for conversations that contain messages from an unknown sender and allows you to enter the name of a new contact to associate with all of the messages from an unknown sender. Conversations involving multiple unknown sender are not supported.

Supported options:

Option Description
-C, --context=COUNT Print COUNT messages of output context during ‘chat-archive search’. This works similarly to ‘grep -C’. The default value of COUNT is 3.
-f, --force Retry synchronization of conversations where errors were previously encountered. This option is currently only relevant to the Google Hangouts backend, because I kept getting server errors when synchronizing a few specific conversations and I didn’t want to keep seeing each of those errors during every synchronization run :-).
-c, --color=CHOICE, --colour=CHOICE

Specify whether ANSI escape sequences for text and background colors and text styles are to be used or not, depending on the value of CHOICE:

  • The values ‘always’, ‘true’, ‘yes’ and ‘1’ enable colors.
  • The values ‘never’, ‘false’, ‘no’ and ‘0’ disable colors.
  • When the value is ‘auto’ (this is the default) then colors will only be enabled when an interactive terminal is detected.
-l, --log-file=LOGFILE Save logs at DEBUG verbosity to the filename given by LOGFILE. This option was added to make it easy to capture the log output of an initial synchronization that will be downloading thousands of messages.
-p, --profile=FILENAME Enable profiling of the chat-archive application to make it possible to analyze performance problems. Python profiling data will be saved to FILENAME every time database changes are committed (making it possible to inspect the profile while the program is still running).
-v, --verbose Increase logging verbosity (can be repeated).
-q, --quiet Decrease logging verbosity (can be repeated).
-h, --help Show this message and exit.
chat_archive.cli.FORMATTING_TEMPLATES = {'conversation_delimiter': '<span style="color: green">{text}</span>', 'conversation_name': '<span style="font-weight: bold; color: #FCE94F">{text}</span>', 'keyword_highlight': '<span style="color: black; background-color: yellow">{text}</span>', 'message_backend': '<span style="color: #C4A000">({text})</span>', 'message_contacts': '<span style="color: blue">{text}</span>', 'message_delimiter': '<span style="color: #555753">{text}</span>', 'message_timestamp': '<span style="color: green">{text}</span>'}

The formatting of output, specified as HTML with placeholders.

chat_archive.cli.UNKNOWN_CONTACT_LABEL = 'Unknown'

The label for contacts without a name or email address (a string).

chat_archive.cli.main()[source]

Command line interface for the chat-archive program.

class chat_archive.cli.UserInterface(*args, **kw)[source]

The Python API for the command line interface for the chat-archive program.

You can set the values of the context, keywords, timestamp_format and use_colors properties by passing keyword arguments to the class initializer.

Here’s an overview of the UserInterface class:

Superclass: ChatArchive
Public methods: gather_context(), generate_html(), get_contact_name(), list_cmd(), normalize_whitespace(), prepare_output(), render_backend(), render_contacts(), render_conversation_summary(), render_messages(), render_output(), render_text(), render_timestamp(), search_cmd(), stats_cmd(), sync_cmd() and unknown_cmd()
Properties: context, html_to_ansi, html_to_text, keyword_highlighter, keywords, redirect_stripper, timestamp_format and use_colors
context[source]

The number of messages of output context to print during searches (defaults to 3).

Note

The context property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

use_colors[source]

Whether to output ANSI escape sequences for text colors and styles (a boolean).

Note

The use_colors property is a custom_property. You can change the value of this property using normal attribute assignment syntax. This property’s value is computed once (the first time it is accessed) and the result is cached. To clear the cached value you can use del or delattr().

html_to_ansi[source]

An HTMLConverter object that uses normalize_emoji() as a text pre-processing callback.

Note

The html_to_ansi property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

redirect_stripper[source]

An RedirectStripper object.

Note

The redirect_stripper property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

html_to_text[source]

An HTMLStripper object.

Note

The html_to_text property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

keyword_highlighter[source]

A KeywordHighlighter object based on keywords.

Note

The keyword_highlighter property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

keywords[source]

A list of strings with search keywords.

Note

The keywords property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

timestamp_format[source]

The format of timestamps (defaults to %Y-%m-%d %H:%M:%S).

Note

The timestamp_format property is a mutable_property. You can change the value of this property using normal attribute assignment syntax. To reset it to its default (computed) value you can use del or delattr().

list_cmd(arguments)[source]

List all messages in the local archive.

search_cmd(arguments)[source]

Search the chat messages in the local archive for the given keyword(s).

stats_cmd(arguments)[source]

Show some statistics about the local chat archive.

sync_cmd(arguments)[source]

Download new chat messages from the supported services.

unknown_cmd(arguments)[source]

Find private conversations with messages from an unknown sender and interactively prompt the operator to provide a name for a new contact to associate the messages with.

generate_html(name, text)[source]

Generate HTML based on a named format string.

Parameters:
  • name – The name of an HTML format string in FORMATTING_TEMPLATES (a string).
  • text – The text to interpolate (a string).
Returns:

The generated HTML (a string).

This method does not escape the text given to it, in other words it is up to the caller to decide whether embedded HTML is allowed or not.

gather_context(messages)[source]

Enhance search results with context (surrounding messages).

render_messages(messages)[source]

Render the given message(s) on the terminal.

normalize_whitespace(text)[source]

Normalize the whitespace in a chat message before rendering on the terminal.

Parameters:text – The chat message text (a string).
Returns:The normalized text (a string).

This method works as follows:

  • First leading and trailing whitespace is stripped from the text.
  • When the resulting text consists of a single line, it is processed using compact() and returned.
  • When the resulting text contains multiple lines the text is prefixed with a newline character, so that the chat message starts on its own line. This ensures that messages requiring vertical alignment render properly (for example a table drawn with | and - characters).
render_conversation_summary(conversation)[source]

Render a summary of which conversation a message is part of.

render_contacts(message)[source]

Render a human friendly representation of a message’s contact(s).

prepare_output(text)[source]

Prepare text for rendering on the terminal.

Parameters:text – The HTML text to render (a string).
Returns:The rendered text (a string).

When use_colors is True this method first uses keyword_highlighter to highlight search matches in the given text and then it converts the string from HTML to ANSI escape sequences using html_to_ansi.

When use_colors is False then html_to_text is used to convert the given HTML to plain text. In this case keyword highlighting is skipped.

render_output(text)[source]

Render text on the terminal.

Parameters:text – The HTML text to render (a string).

Refer to prepare_output() for details about how text is converted from HTML to text with ANSI escape sequences.

get_contact_name(contact)[source]

Get a short string describing a contact (preferably their first name, but if that is not available then their email address will have to do). If no useful information is available UNKNOWN_CONTACT_LABEL is returned so as to explicitly mark the absence of more information.

render_text(message)[source]

Prepare the text of a chat message for rendering on the terminal.

render_timestamp(value)[source]

Render a human friendly representation of a timestamp.

render_backend(value)[source]

Render a human friendly representation of a chat message backend.

chat_archive.database

SQLAlchemy based database helpers.

class chat_archive.database.DatabaseClient(*args, **kw)[source]

Simple wrapper for SQLAlchemy that makes it easy to use with SQLite.

When you initialize a DatabaseClient object you are required to provide a value for the database_url property. You can set the values of the database_file, database_url and echo_queries properties by passing keyword arguments to the class initializer.

Here’s an overview of the DatabaseClient class:

Superclass: ProfileManager
Special methods: __exit__() and __init__()
Public methods: commit_changes()
Properties: database_engine, database_file, database_url, echo_queries, session and session_factory
__init__(*args, **kw)[source]

Initialize a DatabaseClient object.

Please refer to the PropertyManager documentation for details about the handling of arguments.

database_engine[source]

An SQLAlchemy database engine connected to database_url.

Note

The database_engine property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

database_file[source]

The absolute pathname of an SQLite database file (a string or None).

Note

The database_file property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

database_url[source]

A URL that indicates the database dialect and connection arguments to SQLAlchemy (a string).

The value of database_url defaults to a URL that instructs SQLAlchemy to use an SQLite 3 database file located at the pathname given by database_file, but of course you are free to point SQLAlchemy to any supported database server.

Note

The database_url property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named database_url (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

echo_queries[source]

Whether queries should be logged to sys.stderr (a boolean, defaults to False).

Note

The echo_queries property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

session[source]

An SQLAlchemy session created by session_factory.

Note

The session property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

session_factory[source]

An SQLAlchemy session factory connected to database_engine.

Note

The session_factory property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

__exit__(exc_type=None, exc_value=None, traceback=None)[source]

Automatically commit database changes when the with block ends.

commit_changes()[source]

Commit database changes to disk.

class chat_archive.database.SchemaManager(*args, **kw)[source]

Easy to use database schema upgrades based on Alembic.

You can set the values of the alembic_directory, auto_create_schema, auto_upgrade_schema and declarative_base properties by passing keyword arguments to the class initializer.

Here’s an overview of the SchemaManager class:

Superclass: DatabaseClient
Special methods: __init__()
Public methods: initialize_schema() and run_migrations()
Properties: alembic_config, alembic_directory, auto_create_schema, auto_upgrade_schema, current_schema_revision, declarative_base, latest_schema_revision and schema_up_to_date
__init__(*args, **kw)[source]

Initialize a SchemaManager object.

This method automatically calls run_migrations() (and initialize_schema() when the database is initially created) to ensure that the database schema is up to date.

alembic_config[source]

A minimal Alembic configuration object.

This configuration objects contains two options:

Raises:ValueError when alembic_directory isn’t set.

Note

The alembic_config property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

alembic_directory[source]

The absolute pathname of the directory containing Alembic’s env.py file (a string or None).

Note

The alembic_directory property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

auto_create_schema[source]

True if automatic database schema upgrades are enabled, False otherwise.

This defaults to True when declarative_base is set, False otherwise.

Note

The auto_create_schema property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

auto_upgrade_schema[source]

True if automatic database schema initialization is enabled, False otherwise.

This defaults to True when alembic_directory is set, False otherwise.

Note

The auto_upgrade_schema property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

current_schema_revision[source]

The current database schema revision in the database that we’re connected to (a string or None).

Note

The current_schema_revision property is a cached_property. This property’s value is computed once (the first time it is accessed) and the result is cached. To clear the cached value you can use del or delattr().

declarative_base[source]

The base class for declarative models defined using SQLAlchemy.

Note

The declarative_base property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

latest_schema_revision[source]

The current schema revision according to Alembic’s migration scripts (a string).

Note

The latest_schema_revision property is a lazy_property. This property’s value is computed once (the first time it is accessed) and the result is cached.

schema_up_to_date

True if the database schema is up to date, False otherwise.

initialize_schema()[source]

Initialize the database schema using SQLAlchemy.

This method is automatically called when a SchemaManager object is created. In order to initialize the database schema the declarative_base property needs to be set, but if it’s not set then initialize_schema() won’t complain.

run_migrations()[source]

Upgrade the database schema using Alembic.

This method is automatically called when a SchemaManager object is created. In order to upgrade the database schema the alembic_directory property needs to be set, but if it’s not set then run_migrations() won’t complain.

class chat_archive.database.CustomVerbosity(**kw)[source]

Easily customize logging verbosity for a given scope.

This is used by SchemaManager to silence Alembic because it’s rather verbose by default, presumably because its primary purpose is to be a command line program and not a library embedded in an application.

When you initialize a CustomVerbosity object you are required to provide a value for the level property. You can set the values of the level and original_level properties by passing keyword arguments to the class initializer.

Here’s an overview of the CustomVerbosity class:

Superclass: PropertyManager
Special methods: __enter__() and __exit__()
Properties: level and original_level
level[source]

The overridden logging verbosity level.

Note

The level property is a required_property. You are required to provide a value for this property by calling the constructor of the class that defines the property with a keyword argument named level (unless a custom constructor is defined, in this case please refer to the documentation of that constructor). You can change the value of this property using normal attribute assignment syntax.

original_level[source]

The original logging verbosity level.

Note

The original_level property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

__enter__()[source]

Customize the logging verbosity when entering the with block.

__exit__(exc_type=None, exc_value=None, traceback=None)[source]

Restore the original logging verbosity when leaving the with block.

chat_archive.emoji

Utility functions to translate between various forms of smilies and emoji.

chat_archive.emoji.normalize_emoji(text)[source]

Translate textual smilies, hollow smilies and macros to color emoji.

chat_archive.html

Utility functions for working with the HTML encoded text.

chat_archive.html.BLOCK_TAGS = ['div', 'p', 'pre']

A list of strings with HTML tags that are considered block-level elements. The HTMLStripper emits an empty line before and after each block-level element that it encounters.

chat_archive.html.URL_PATTERN = re.compile('(http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)')

A compiled regular expression pattern to find URLs in text (credit: taken from urlregex.com).

chat_archive.html.html_to_text(html_text)[source]

Convert HTML to plain text.

Parameters:html_text – A fragment of HTML (a string).
Returns:The plain text (a string).

This function uses the HTMLStripper class that builds on top of the html.parser.HTMLParser class in the Python standard library.

chat_archive.html.text_to_html(text, callback=None)[source]

Convert plain text to HTML.

Parameters:
  • text – A fragment of plain text (a string).
  • callback – An optional callback that provides the caller a chance to pre-process text before it is encoded as HTML.
Returns:

The HTML encoded text (a string).

This function replaces URLs with <a href="..."> tags and escapes special characters, that’s it, nothing more.

class chat_archive.html.HTMLStripper(*, convert_charrefs=True)[source]

A simple HTML to text converter based on html.parser.HTMLParser.

__call__(data)[source]

Convert HTML to text.

Parameters:data – The HTML to convert to text (a string).
Returns:The converted text (a string).

This method calls compact_empty_lines() on the converted text to normalize superfluous empty lines caused by vertical whitespace emitted around block level elements like <div>, <p> and <pre>.

handle_charref(value)[source]

Process a decimal or hexadecimal numeric character reference.

Parameters:value – The decimal or hexadecimal value (a string).
handle_data(data)[source]

Capture decoded text data.

handle_endtag(tag)[source]

Emit empty lines around block level elements.

handle_entityref(name)[source]

Process a named character reference.

Parameters:name – The name of the character reference (a string).
handle_starttag(tag, attrs)[source]

Translate <br> tags to line breaks.

reset()[source]

Reset the state of the HTMLStripper instance.

chat_archive.html.keywords

Utility functions for working with the HTML encoded text.

class chat_archive.html.keywords.KeywordHighlighter(*args, **kw)[source]

A simple keyword highlighter for HTML based on html.parser.HTMLParser.

__init__(*args, **kw)[source]

Initialize a KeywordHighlighter object.

Parameters:
  • keywords – A list of strings with keywords to highlight.
  • highlight_template – A template string with the {text} placeholder that’s used to highlight keyword matches.
__call__(data)[source]

Highlight keywords in the given HTML fragment.

Parameters:data – The HTML in which to highlight keywords (a string).
Returns:The highlighted HTML (a string).
handle_charref(value)[source]

Process a numeric character reference.

handle_data(data)[source]

Process textual data.

handle_endtag(tag)[source]

Process an end tag.

handle_entityref(name)[source]

Process a named character reference.

handle_starttag(tag, attrs)[source]

Process a start tag.

handle_startendtag(tag, attrs)[source]

Process a start tag without end tag.

render_attrs(attrs)[source]

Process the attributes of a tag.

reset()[source]

Reset the state of the keyword highlighter.

Clears the output buffer but preserves the keywords to be highlighted. This method is called implicitly during initialization.

chat_archive.html.redirects

Utility functions to pre-process URLs before rendering on a terminal.

In web browsers and chat clients the URLs behind hyperlinks are usually hidden, but in a terminal there’s no “out of band” mechanism to communicate the URL behind a hyperlink - the URL needs to appear literally in the text that is rendered to the terminal.

Given this requirement, I’ve become rather annoyed at Google prefixing every URL they can get their hands on with https://www.google.com/url?q=… because this user hostile “encoding” obscures the intended URL with a lot of fluff that I don’t care for.

This module contains the expand_url() function to transform redirect URLs into their target URL, the strip_redirects() function to transform all redirect URLs in a given text and RedirectStripper to transform all redirect URLs in a given HTML fragment.

chat_archive.html.redirects.GOOGLE_REDIRECT_URL = 'www.google.com/url'

The base URL of the Google redirect service (a string).

Note that the URL scheme is omitted on purpose, to enable a substring search for the Google redirect service regardless of whether a given URL is using the http:// or https:// scheme.

chat_archive.html.redirects.URL_PATTERN = re.compile('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')

A compiled regular expression pattern to find URLs in text (credit: taken from urlregex.com).

chat_archive.html.redirects.expand_url(url)[source]

Expand a redirect URL to its target URL.

Parameters:url – The URL to expand (a string).
Returns:The expanded URL (a string).
chat_archive.html.redirects.strip_redirects(text)[source]

Expand redirect URLs in the given text.

Parameters:text – The text to process (a string).
Returns:The processed text (a string).
chat_archive.html.redirects.strip_redirects_callback(match)[source]

Apply expand_url() to the matched URL.

class chat_archive.html.redirects.RedirectStripper(*, convert_charrefs=True)[source]

Expand redirect URLs embedded in HTML.

This class uses html.parser.HTMLParser to parse HTML and expand any redirect URLs that it encounters to their target URL. The __call__() method provides an easy way to use this functionality.

__call__(data)[source]

Pre-process the URLs in the given HTML fragment.

Parameters:data – The HTML to pre-process (a string).
Returns:The pre-processed HTML (a string).
handle_charref(value)[source]

Process a numeric character reference.

handle_data(data)[source]

Process textual data.

handle_endtag(tag)[source]

Process an end tag.

handle_entityref(name)[source]

Process a named character reference.

handle_starttag(tag, attrs)[source]

Process a start tag.

handle_startendtag(tag, attrs)[source]

Process a start tag without end tag.

render_tag(tag, attrs, close)[source]

Process the attributes of a tag.

reset()[source]

Reset the state of the keyword highlighter.

Clears the output buffer but preserves the keywords to be highlighted. This method is called implicitly during initialization.

chat_archive.models

Database models for the chat-archive program based on SQLAlchemy.

The chat_archive.models module defines the following database models for the chat-archive program:

chat_archive.models.metadata = MetaData(bind=None)

Define an explicit naming convention to simplify future database migrations.

class chat_archive.models.Base(**kwargs)

The most base type

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

chat_archive.models.address_mapping = Table('email_address_mapping', MetaData(bind=None), Column('contact_id', Integer(), ForeignKey('contacts.id'), table=<email_address_mapping>), Column('address_id', Integer(), ForeignKey('email_addresses.id'), table=<email_address_mapping>), schema=None)

Mapping table for many-to-many relationship between contacts and email addresses.

chat_archive.models.telephone_number_mapping = Table('telephone_number_mapping', MetaData(bind=None), Column('contact_id', Integer(), ForeignKey('contacts.id'), table=<telephone_number_mapping>), Column('telephone_number_id', Integer(), ForeignKey('telephone_numbers.id'), table=<telephone_number_mapping>), schema=None)

Mapping table for many-to-many relationship between contacts and telephone numbers.

class chat_archive.models.Account(**kwargs)[source]

Database model for chat accounts.

id

The primary key of the account (an integer).

backend

The name of the backend that manages this account (a string).

name

A user defined name for the account (a string).

contacts

The contacts that have been imported using this account.

conversations

The conversations that have been imported using this account.

name_is_significant

True if the database contains multiple accounts with this backend, False otherwise.

__repr__()[source]

Render a human friendly representation of an Account object.

__str__()[source]

Render a human friendly representation of an Account object.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class chat_archive.models.EmailAddress(**kwargs)[source]

Database model for email addresses of chat contacts.

id

The primary key of the email address (an integer).

value

The email address itself (a string).

__repr__()[source]

Render a human friendly representation of an EmailAddress object.

__str__()[source]

Render a human friendly representation of an EmailAddress object.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class chat_archive.models.TelephoneNumber(**kwargs)[source]

Database model for telephone numbers of chat contacts.

id

The primary key of the telephone number (an integer).

value

The telephone number itself (a string).

__repr__()[source]

Render a human friendly representation of an TelephoneNumber object.

__str__()[source]

Render a human friendly representation of an TelephoneNumber object.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class chat_archive.models.Contact(**kwargs)[source]

Database model for chat contacts.

id

The primary key of the contact (an integer).

account_id

A foreign key to associate contacts with accounts.

external_id

An optional backend specific identifier for contacts (an opaque string or None).

first_name

The contact’s first name (a string or None).

last_name

The contact’s last name (a string or None).

account

The account that this contact belongs to (an Account object).

email_addresses

The email addresses of this contact.

telephone_numbers

The telephone numbers of this contact.

sent_messages

The chat messages that were sent by this contact.

received_messages

The chat messages that were received by this contact.

first_name_is_unambiguous

True if this first name unambiguously refers to a single contact, False otherwise.

full_name

The full name of the contact (as an SQL expression).

unambiguous_name

The shortest unambiguous name of the contact (a string or None).

__repr__()[source]

Render a human friendly representation of a Contact object.

__str__()[source]

Render a human friendly representation of a Contact object.

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class chat_archive.models.Conversation(**kwargs)[source]

Database model for chat conversations.

id

The primary key of the conversation (an integer).

account_id

A foreign key to associate conversations with accounts.

external_id

An optional backend specific identifier for conversations (an opaque string or None).

name

An optional name for the conversation (a string or None).

last_modified

The time when the conversation was last modified (a datetime value or None).

import_complete

Whether the full conversation has been imported (a boolean, defaults to False).

import_errors

Whether errors were encountered during the import (a boolean, defaults to False).

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

is_group_conversation

Whether the conversation is a group conversation (a boolean, defaults to False).

account

The account that this conversation belongs to (an Account object).

messages

The chat messages that belong to this conversation.

have_unknown_senders

Whether this conversation includes messages from unknown senders (a boolean).

newest_message

The newest message in the conversation (a Message object or None).

oldest_message

The oldest message in the conversation (a Message object or None).

participants

The Contact objects that have participated in this conversation.

delete_messages()[source]

Delete existing chat messages in the conversation.

__str__()[source]

Render a human friendly representation of a Contact object.

class chat_archive.models.Message(**kwargs)[source]

Database model for chat messages.

Note that the Message model doesn’t have a direct relationship to the Account model because these two models already have an indirect relationship via the Conversation model (in other words, messages are implicitly namespaced to accounts via conversations).

__init__(**kwargs)

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

id

The primary key of the chat message (an integer).

external_id

An optional backend specific identifier for chat messages (an opaque string or None).

timestamp

The timestamp of the chat message (a datetime value).

conversation_id

A foreign key to associate chat messages with conversations.

sender_id

A foreign key that points to the contact who sent this message (an integer or None).

recipient_id

A foreign key that points to the contact who received this message (an integer or None).

raw

The raw message text in a backend specific format (a string or None).

The reason that this field was added to the database schema is because the Slack backend emits chat messages in the somewhat peculiar mrkdwn format which is “almost but not quite” human readable (in my opinion). When the Slack backend imports a new message, the following steps take place:

  1. The original message text is stored without any modifications in the raw column.

  2. A custom mrkdwn parser developed for the chat-archive program is used to convert raw to html (during the import).

  3. The value of html is used to generate the value of text (during the import).

    If this surprises you: I could have developed a second mrkdwn converter with a different output format, but that’s 150 lines of code I don’t care to repeat and html_to_text() works fine for this purpose 😇.

If the custom mrkdwn parser (which is bound to contain bugs) receives bug fixes in a new release of the chat-archive program then raw values can be used to regenerate text and html values.

text

The human readable plain text of the chat message (a string).

This field cannot be None (NULL) and is expected to always contain a nonempty chat message text. This field is used during searches and when chat-archive --colors=never is run.

html

The formatted text of the chat message (a string or None).

When a chat message doesn’t contain text formatting or hyperlinks html will be None and text should be used instead. This field will be used when chat-archive --color=yes is run.

conversation

The conversation that this chat message took place in (a Conversation object or None).

sender

The contact that sent the message (a Contact object or None).

recipient

The contact that received the message (a Contact object or None).

newer_messages

Newer messages in the conversation (not yet sorted!).

next_message

The next message in the conversation (or None).

older_messages

Older messages in the conversation (not yet sorted!).

previous_message

The previous message in the conversation (or None).

find_distance(other_message)[source]

Compute the distance between two messages.

__repr__()[source]

Render a human friendly representation of a Message object.

__str__()[source]

Render a human friendly representation of a Message object.

chat_archive.profiling

Easy to use Python code profiling support.

class chat_archive.profiling.ProfileManager(*args, **kw)[source]

Base class for easy to use Python code profiling support.

This class makes it easy to enable and disable Python code profiling and save the results to a file. You can use it in a with statement to guarantee that the profile is saved even when your program is interrupted with Control-C, so when your program is too slow and you’re wondering why you can just restart the program with profiling enabled, wait for it to get slow, give it a while to collect profile statistics and then interrupt it with Control-C.

When profile_file is set the class initializer method will automatically call enable_profiling().

You can set the values of the profile_file, profiler and profiling_enabled properties by passing keyword arguments to the class initializer.

Here’s an overview of the ProfileManager class:

Superclass: PropertyManager
Special methods: __enter__(), __exit__() and __init__()
Public methods: disable_profiling(), enable_profiling() and save_profile()
Properties: can_save_profile, profile_file, profiler and profiling_enabled
__init__(*args, **kw)[source]

Initialize a ProfileManager object.

Please refer to the PropertyManager documentation for details about the handling of arguments.

__enter__()[source]

Automatically enable code profiling when the with block starts.

__exit__(exc_type=None, exc_value=None, traceback=None)[source]

Disable code profiling and save the profile statistics when the with block ends.

can_save_profile

True if save_profile() is expected to work, False otherwise.

profile_file[source]

The pathname of a file where Python profile statistics should be saved (a string or None).

Note

The profile_file property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

profiler[source]

A profile.Profile object (if profile_file is set) or None.

Note

The profiler property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

profiling_enabled[source]

True if code profiling is enabled, False otherwise.

Note

The profiling_enabled property is a writable_property. You can change the value of this property using normal attribute assignment syntax.

enable_profiling()[source]

Enable Python code profiling.

disable_profiling()[source]

Disable Python code profiling.

save_profile(filename=None)[source]

Save gathered profile statistics to a file.

Parameters:filename – The pathname of the profile file (a string or None). Defaults to the value of profile_file.
Raises:ValueError when profiling was never enabled or filename isn’t given and profile_file also isn’t set.

chat_archive.utils

Utility functions for the chat-archive program.

chat_archive.utils.ensure_directory_exists(pathname)[source]

Create a directory if it doesn’t exist yet.

Parameters:pathname – The pathname of the directory (a string).
chat_archive.utils.get_full_name()[source]

Find the full name of the current user on the local system based on /etc/passwd.

Returns:A string with the full name of the current user or an empty string when this information is not available.
chat_archive.utils.get_secret(options, value_option, name_option, description)[source]

Get a secret needed to connect to a chat service (like a password or API token).

Parameters:
  • options – A dictionary with configuration options.
  • value_option – The name of the configuration option that defines the value of a secret (a string).
  • name_option – The name of the configuration option that defines the name of a secret in ~/.password-store (a string). See also get_secret_from_store().
  • description – A description of the type of secret that the operator will be prompted for (a string).
Returns:

The password (a string).

chat_archive.utils.get_secret_from_store(name, directory=None)[source]

Use qpass to get a secret from ~/.password-store.

Parameters:
  • name – The name of a password or a search pattern that matches a single entry in the password store (a string).
  • directory – The directory to use (a string, defaults to ~/.password-store).
Returns:

The secret (a string).

Raises:

exceptions.ValueError when the given name doesn’t match any entries or matches multiple entries in the password store.

chat_archive.utils.prompt_for_password(prompt_text)[source]

Interactively prompt the operator for a password.

chat_archive.utils.utc_to_local(utc_value)[source]

Convert a UTC datetime object to the local timezone.

Change log

The change log lists notable changes to the project:

Changelog

The purpose of this document is to list all of the notable changes to this project. The format was inspired by Keep a Changelog. This project adheres to semantic versioning.

Release 4.0.2 (2018-12-31)

  • Merged pull request #1: Automatically create archive directory when it doesn’t exist yet.
  • Bumped hangups from 0.4.4 to 0.4.6 to improve Google Hangouts authentication compatibility.

Note

Hangups release 0.4.6 (the latest available) doesn’t actually work for me, although I managed to get it to connect successfully after hacking in captcha support, which I’ve since submitted as pull request #446 🙂.

Release 4.0.1 (2018-08-02)

Just before publishing this project yesterday I propagated a rename throughout the code base, rephrasing “password” as “secret” (my rationale being that “naming things is important” 😇). Unfortunately that rename was propagated a bit more thoroughly than I had intended, impacting the interaction with the Hangups API. This should be fixed in release 4.0.1. For posterity, this relates to the following exception:

AttributeError: 'GoogleAccountCredentials' object has no attribute 'get_password'

Release 4.0 (2018-08-01)

The initial public release! 🎉

Because I love giving mixed signals I’ve decided to use the version number 4.0 for this release (because four chat service backends are supported) but I’ve added the “beta” trove classifier to the setup.py script and I’ve added a big fat disclaimer to the readme (see the status section) 😛.

While publishing the project I decided to be pragmatic and strip the version control history, because in the first weeks of development I hard coded quite a few secrets in the code base. Since then I’ve added support for configuration files and even ~/.password-store but of course those secrets remain in the history…

Now I could have spent hours pouring through tens of thousands of lines of patch output to remove those secrets without trashing the history. Instead I decided to do something more useful with my time, hence “pragmatic” above 😇.

PS. This is that “awesome new project” that I’ve been referring to in the humanfriendly changelog. Over the course of developing chat-archive I’ve moved more than six hundred lines of code to the humanfriendly package due to its general purpose nature (the HTML to ANSI conversion).