Welcome to the BookBrainz Developer Docs!

This documentation is intended to act as a guide for developers working on or with the BookBrainz project, describing the system, its modules and functions. The BookBrainz webservice is also fully documented here.

For a description of the BookBrainz and end-user oriented documentation, please see the BookBrainz User Guide.

Contents

BookBrainz Schema

Introduction

The BookBrainz schema describes how the data used by BookBrainz is stored. It’s quite important to have a good idea of this before you look at any of our code. If you’re coming from a MusicBrainz background, our schema is similar, but there are some key differences - see Coming from MusicBrainz.

Definition

The BookBrainz schema is defined in the python-bbschema package, which you can access at https://github.com/BookBrainz/python-bbschema. When the schema is more stable, there’ll be a nice schema diagram here to help you understand how things fit together, but for now, you’ll have to make do with some words.

Entities

In BookBrainz, an entity is a container for some data, associated with some globally unique identifiers (GIDs). BookBrainz represents a number of different objects as entities, and each of these has its own particular associated data. In the following section, we’ll go through the database structures used to represent entities, and talk about each type of entity in more detail.

Generic Entity Tables

Entity and Entity Redirect

All entity GIDs are stored in a single table in the database, entity.

A second table, entity_redirect, allows redirection of GIDs. For example, if one entity was merged into a second entity, then a row would be created in the entity_redirect table to indicate a mapping to the merge target.

Entities in BookBrainz are versioned, which means that the entire entity history is stored in the database. To make this possible, we have an additional table in the database, called entity_tree.

_images/entity.1degree.png

The entity table and its relationships.

Entity Tree

The entity_tree table is the place where information is actually stored. You can think of this table like a folder on a computer. It points to the various bits of data related to a particular version of the entity. For example, the most recent annotation ID, disambiguation comment ID and entity-specific data ID will usually be stored at this level.

When a new version of an entity is created, a corresponding row in entity_tree is created, which will indicate that data was updated by modifying one of the stored IDs.

_images/entity_tree.1degree.png

The entity_tree table and its relationships.

Entity Data

Each entity tree will point to a particular row in the entity_data table. We use joined table inheritance to represent the different entities in BookBrainz, and this single entity_data table represent the base object in this inheritance hierarchy. It contains an ID and a field to determine the type of entity data stored, known as the discriminator.

_images/entity_data.1degree.png

The entity_data table and its relationships.

Additional Tables

There are some additional tables related to all types of entity. We’ve already mentioned annotations and disambiguations, so let’s talk a little more about those.

An annotation is a way of making notes about an entity, for other editors to read. It stores some content associated with an ID. Disambiguation comments, stored in the disambiguation table, have a similar data structure but are intended to contain a short description to allow editors to easily differentiate between similarly-named entities.

An alias represents a name or title. Each alias will store some text along with a language, and a couple of flags to indicate whether the alias is primary and whether it is native. An entity can only have one native alias, which indicates its original name. It can have many primary aliases, which give the most common names in particular languages. Native aliases will usually also be primary.

Specific Entities

Publication

Publications represent the books, magazines, articles, newspapers, novels and other published materials catalogued in BookBrainz. The table publication_data stores the entity specific data for publications, and represents an object derived from entity_data.

Coming from MusicBrainz

This page describes the key differences between the BookBrainz and MusicBrainz schemas, for developers already familiar with the MusicBrainz schema.

Entity Base Object

In BookBrainz, there is an “Entity” base object. What this means is that all entities share some common data, and store this data in a single table in the database. This greatly simplifies the rest of the database compared to MusicBrainz - instead of having a set of tables defined for each type of entity, we only need a single set of tables referencing the base table for entities.

The following fields are stored in the base table:

  • gid
  • master_revision_id
  • last_updated

BookBrainz Webservice

The BookBrainz webservice provides developers with a way to make programs which use BookBrainz data. The BookBrainz site itself uses the web service to access and modify data.

Authentication

To allow users to authentication with BookBrainz, the webservice implements OAuth 2. We use the password grant type, meaning that clients must forward user credentials (username and password) to the website over HTTPS.

When a client successfully authenticates a user, they recieve an access token, which must be sent in the header of every subsequent POST request.

Example requests are shown below, for a username “Jim” and password “Bob123”.

Authentication Request:

{
  "client_id": "de305d54-75b4-431b-adb2-eb6b9e546013",
  "username": "Jim",
  "password": "Bob123"
}

Authentication Response:

{
  "access_token": "f47ac10b58cc4372a5670e02b2c3d479"
  "refresh_token": "16fd27068baf433b82eb8c7fada847da"
}

Subsequent POST:

HEADERS:
  Authorization: Bearer f47ac10b58cc4372a5670e02b2c3d479

{
  "post": "data",
  "goes": "here"
}