Binary Structured Data Format¶
The Binary Structured Data Format (BSDF) is an open specification for serializing (scientific) data, for the purpose of storage and (inter process) communication.
It's designed to be a simple format, making it easy to implement in many programming languages. However, the format allows implementations to support powerful mechanics such as lazy loading of binary data, and streamed reading/writing.
BSDF is a binary format; by giving up on human readability, BSDF can be simple, compact and fast. See the full specification, or how it compares to other formats.
The source code is at Gitlab.
Data types and extensions¶
BSDF supports 8 base types: null, booleans, integers, floats, strings/text, (heterogenous) lists, mappings (i.e. dictionaries), and binary blobs. Integers and floats represent 64 bit numbers, but can be encoded using less bytes. Binary blobs can optionally be compressed (zlib or bz2), can have checksums, and can be resized.
Via an efficient extension mechanism, other data types (including custom ones), can be serialized. The standard extensions work out of the box, supporting e.g. nd-arrays and complex numbers.
Status¶
The format is complete, except for a few details such us how to deal with blob checksums. All implementations comply with the format and are well-tested. We could do with implementatations in additional languages though!
Implementations¶
Implementations currently exist for multiple languages. Each implementation is continuously tested to ensure compatibility.
- The Python implementation in the form of bsdf.py.
- The lite Python implementation in the form of bsdf_lite.py.
- The Matlab / Octave implementation in the form of Bsdf.m.
- The JavaScript implementation in the form of bsdf.js.
We'd like implementations for other languages (such as R and Julia). BSDF is designed to be easy to implement; perhaps you want to contribute?
We aim for the implementations to have similar API's: a class whose
instances hold extensions and options, and has encode()
, decode()
,
save()
,load()
, and add_extension()
methods. Optionally, an implementation
can provide convenience functions.
There is also a command line interface that can be used to e.g. create and view BSDF files.
Installation¶
See the specific implementations for detailed installation instructions. Most implementations consist of a single file.
Examples¶
In Python:
>>> import bsdf
>>> b = bsdf.encode(['just some objects', {'foo': True, 'bar': None}, 42.001])
>>> b
b'BSDF\x02\x00l\x03s\x11just some objectsm\x02\x03fooy\x03barvd\xe3\xa5\x9b\xc4 \x00E@'
>>> len(b)
48
>>> bsdf.decode(b)
['just some objects', {'foo': True, 'bar': None}, 42.001]
See more Python examples, see the Python example notebook.
In JavaScript:
> bsdf = require('bsdf.js')
{ encode: [Function: bsdf_encode],
decode: [Function: bsdf_decode],
BsdfSerializer: [Function: BsdfSerializer],
standard_extensions: ...}
> b = bsdf.encode(['just some objects', {foo: true, bar: null}, 42.001])
ArrayBuffer { byteLength: 48 }
> bsdf.decode(b)
[ 'just some objects', { foo: true, bar: null }, 42.001 ]
In Matlab / Octave:
>> bsdf = Bsdf()
>> b = bsdf.encode({'just some objects', struct('foo', true, 'bar', []), 42.001});
>> size(b)
ans =
48 1
>> bsdf.decode(b)
ans =
{
[1,1] = just some objects
[1,2] =
scalar structure containing the fields:
foo = 1
bar = [](0x0)
[1,3] = 42.001
}
It is worth noting that although different languages may represent data types in slightly different ways, the underlying bytes in BSDF are the same. This makes BSDF suited for inter-language communication.
License¶
In principal, all implementations in the BSDF repository use the 2-clause BSD license (see LICENSE for details), unless otherwise specified. All code is liberally licensed (BSD- or MIT-like).
Contents¶
The BSDF Command Line Interface¶
BSDF has a command line interface (CLI) for performing simple tasks, such as
inspecting, converting and creating BSDF files. The CLI is part of the
Python implementation, so pip install bsdf
to start using it.
Using the CLI¶
After installation, depending on your Python setup, the CLI may be
available as the bsdf
command. If this is the case, you can run:
$ bsdf ...
If this is not the case, or if you want to target a specific Python version, use:
$ python -m bsdf ...
Getting help¶
To get started, run the help command:
$ bsdf help
which yields:
Command line interface for the Binary Structured Data Format.
See http://bsdf.io for more information on BSDF.
usage: bsdf command [options]
Available commands:
convert - Convert one format into another (e.g. JSON to BSDF).
create - Create a BSDF file from data obtained by evaluation Python code.
help - Show the help text.
info - Print meta information about the given BSDF file.
version - Print the version of the current Python implementation.
view - View the content of a given BSDF file.
Run 'bsdf help command' or 'bsdf command --help' to learn more.
Dive deeper using e.g.
$ bsdf help view
Example¶
$ bsdf create foo.bsdf '["xx", 4, None, [3, 4, 5]*3]'
$ bsdf info foo.bsdf
BSDF info for: C:\dev\pylib\bsdf\python\foo.bsdf
file_name: foo.bsdf
file_size: 45
file_mtime: 2017-12-21 15:21:41
is_valid: true
file_version: 2.1
$ bsdf view foo.bsdf
[ list with 4 elements
'xx'
4
null
[ list with 9 elements
3
4
5
3
4
5
3
4
5
]
]
$ bsdf view foo.bsdf --depth=1
[ list with 4 elements
'xx'
4
null
[ list with 9 elements ]
]
Comparing BSDF with other formats¶
The question that arises with any new format: Why, oh Why? Why yet another format!?
In short, there was no format that could serialize nd-array data well, and also work well on the web. The realization that HDF5 is not so great, a strong need to send scientific data between Python and JavaScript, and a repeated annoyance with JSON has nudged me to create BSDF.
This page tries to compares BSDF with other formats, and explains why these formats were in my view insufficient for my needs.
BSDF vs JSON¶
Although JSON is very widely used, it has several limitations:
- JSON's inability to encode
nan
andinf
can be painful. - No support for binary data or nd-arrays (base64 is a compromise worth avoiding).
- It's kind of human readable, but very verbose, and not easy to write (e.g. a comma after the last item in a list breaks things).
- Many JSON implementations allow extending the types, but this involves an extra function call for each element, which degrades the performance.
BSDF vs UBJSON et al.¶
Binary formats commonly used on the web that were considered are
ubjson, msgpack, bson.
Most are rather web-oriented, or adhere strictly to JSON compatibility (e.g.
no nan
). Most do not support typed arrays, let alone nd-arrays, and/or
decode such arrays in JavaScript as regular arrays instead of array
buffers. In short; none of these seemed to provide the flexibility that
a scientific data format needs.
BSDF differs from most of them by its flexibility for encoding binary data, and its simple extension mechanism.
It's worth noting that BSDF does not support typed arrays as one of its base types, but the extension for typed nd-arrays is a standard extension available in most implementations.
BSDF vs HDF5¶
HDF5 is a popular format for scientific data, but there are also good reasons to avoid it, as e.g. explained the paper on ASDF and this blog post. Summarizing:
- HDF5 is a complex specification and (therefore) there is really just one implementation that actually works.
- The implementation sometimes has bugs or performance issue, but there are no alternatives.
- Not human readable, and no other tools for inspection except that one implementation.
- No proper mappings (dicts) and lists.
HDF5 is certainly more flexible, e.g. with regard to providing lazy loading parts of compressed data. However, BSDF does support resizing of binary data, in-place editing, lazy loading, and streamed reading and writing.
BSDF vs ASDF¶
The ASDF format has goals that partly overlap with the purpose of BSDF:
- intrinsic hierarchical structure
- human readable
- based on existing data format (yaml)
- support for references (also to external objects)
- efficient updating
- machine independent, structured data, ndarrays
- support for writing (and reading) streams
- explicit versioning
- explicit extensibility without interference
- support for validation with schemas
ASDF was seriously considered before the development on BSDF started. The idea of a human readable format is appealing, but ...
- Yaml is a rather ill defined format that is hard to parse, which is probably why the parser is so slow.
- Data that consist of many elements (but not so much blobs) will be encoded inefficiently.
- Many text editors won't deal well with huge text files.
- If the text is edited, byte alignments are likely to break.
- It makes the format more complex (you basically have two formats).
This is why BSDF drops human readability, gaining a format that is simple, compact, and fast to parse. This is not to say that ASDF did it wrong; it is very suited for what it was designed for. But BSDF is more suited for e.g. inter process communication.
BSDF vs Arrow¶
The goals of Apache Arrow bear similarities with BSDF, with e.g. a clear standard and zero copy reads. However, it's rather focussed on columnar data (where BSDF supports nd-arrays), and seems oriented at compiled languages, i.e. less flexible. Although the specification looks easy to read, the Python implementation is much larger than BSDF's 800 or so lines of code. It's also not pure Python, making it nontrivial to install on less common Python versions/implementations.
BSDF vs NPZ¶
Numpy has a builtin way to encode typed arrays. However, this is limited to arrays (no meta data), and rather specific to Python.
BSDF vs SSDF (and BSDF v1)¶
Around 2011 I developed a human readable file format called
SSDF, suited for storing
hierachical data, similar to JSON, but with support for nan
and inf
.
It also supports nd-arrays, via base64 encoding and zlib compression.
I've used this in several (scientific) projects (e.g. it was used in
the Pyzo IDE to store config data).
Although it does serve its purpose, its not terribly good for large
binary datasets. I also kept coming back in need of a format to send
binary data to/from JS, where compression is a problem.
At some point I developed a binary equivalent of SSDF that's fully compatible, but stored binary data more effectively. The current BSDF format can be seen as its successor, being both simpler and more extensible. This is also why BSDF's version number starts at 2.
I am currently of the opinion that a format that is good at binary data can not also be good at being a human readable (config) format. See e.g. toml for a well-readable format.
Contributing to BSDF¶
There are several ways that you can contribute to BSDF. From contributing bugs in the issue tracker, to providing fixes and improvements, or even contributing new implementations.
Organization of the code¶
Since BSDF is designed to be simple, implementations are usually restricted to a single module. The BSDF Gitlab repo contains implementations for several languages, organized in sub directories. This allows testing each implementation using a "test service", and ensures compatibility between the different implementations.
Development dependencies¶
The tooling around BSDF is implemented in Python. For development, you
need Python 3.x and the invoke library (pip install invoke
).
To run tasks such as tests, run invoke
from the root repo to get
started.
Workflow¶
To start contributing an enhancement or new implementation, please start by making an issue to start the discussion. The actual code will be contributed via pull requests.
It is expected that each implementation will be more or less maintained by its own group of contributors.
Code of conduct¶
BSDF does not have an official code of conduct yet, but let's just say that we expect respect from and towards all contributors, and will not tolerate discrimination or trolling.
Extending BSDF¶
BSDF can encode special kinds of data by providing the serializer with extensions. How users specify extensions is specific to the implementation, but they will typically consist of 4 elements:
- A name to identify it with. This will be encoded along with the data, so better keep it short, although custom extensions are best prefixed with the context (e.g. 'mylibrary.myextension'), to avoid name clashes.
- A type and/or a match function, so that the BSDF encoder can determine what objects must be serialized.
- An encoder function to convert the special object to more basic objects.
- A decoder function to reconstruct the special object from the basic objects.
How it works¶
Extensions encode a high level data types into more basic data types, such as the base BSDF types, or types supported by other extensions. Upon decoding, the extension reconstructs the high level data from the "lower level" data. When an extension is not available during decoding, a warning is produced, and the object is represented in its underlying basic form.
Extensions add very little overhead in speed (unlike e.g. JSON). In terms of memory, each object being converted needs a little extra memory to encode the extension's name.
Kinds of extensions¶
Everyone can write their own extension and use it in their own work.
The purpose of this document is to specify ways to convert common data types, and how these extensions should be named. If everyone adheres to these specifcations, data will be easier to share.
BSDF also defines a small set of standard extensions, which users are stongly encouraged to follow, and which all BSDF implementations are encouraged to support by default.
Status¶
This is a work in progress and the specifications below are subject to change. The standardization of a base set of extensions should settle soonish after the BSDF format itself has stabilized.
Standard extensions¶
Complex numbers¶
- name: "c"
- encoding: a list with two elements (the real and the imaginary part).
N-dimensional arrays¶
- name: "ndarray"
- encoding: a dict with elements:
- 'dtype', a string that specifies the data type. Minimal support should be 'uint8', 'int8', 'uint16', 'int16', 'uint32', 'int32', 'float32', and preferably 'uint64', 'int64' and 'float64'.
- 'shape', a list with as many elements (integers) as the array has dimensions. The first changing dimension first.
- 'data', a blob of bytes representing the contiguous data.
We might add an "order" field at a later point. This will need to be investigated/discussed further. Until then, C-order (row-major) should be assumed where it matters.
Other extensions¶
2D image data¶
- name: 'image2d'
- encoding: a dict with elements:
- array: an ndarray with 2 or 3 dimensions
- meta: a dict with arbitrary data
If the data is 3D, the 3d dimension represents the color channels (1: L, 2: LA , 3: RGB or 4: RGBA).
3D image data¶
- name: 'image3d'
- encoding: a dict with elements:
- array: an ndarray with 3 or 4 dimensions
- meta: a dict with arbitrary data
If the data is 4D, the 3d dimension represents the color channels (1: L, 2: LA , 3: RGB or 4: RGBA).
The BSDF format specification¶
This document applies to BSDF format VERSION = 2.2.
Purpose and features¶
The purpose of BSDF is to provide a data format that is ...
- easy to implement, such that it can easily spread to other programming languages.
- suitable for working with binary (scientific) data.
- suitable for inter process communication and the web.
This has resulted in the following features:
- A binary format that has a simple specification.
- Language agnostic and machine independent.
- Compact storage.
- Fast encoding and decoding. E.g. the pure Python implementation has a respectable speed, and can be made faster via e.g. a C implementation.
- Support for binary blobs, in uncompressed format or compression with zlib or bz2.
- Uses data types that are widely supported in most languages.
- Provides a mechanism to easily convert to/from special data types, with minimal effect on performance, also accross languages.
- Data can be read and written without seek operations (e.g. to allow (streamed) reading from remote resources).
- Zero copy reads (in uncompressed data, bytes are aligned).
- Implementations can provide direct access to blobs via a file-like object for lazy loading or efficient updating.
- Provides a way to stream data (e.g. as a list at the end of the file that can simply be appended to).
Also see how BSDF compares to other formats.
Minimal implementation¶
A minimal BSDF implementation must support:
- the basic data types: null, bool, int, float, string, list, mapping, and uncompressed binary blobs.
- reading (closed and unclosed) streams (at the end of a data structure).
- preferably most standard extensions.
Implementations are encouraged to support:
- user-defined extensions.
- compressed binary blobs (zlib and bz2).
Further, implementations can be made more powerful by supporting:
- Lazy loading of blobs.
- Editing of (uncompressed) blobs.
- Lazy loading of streams.
- Deferred writing of streams.
The format¶
Each data value is identified using a 1 byte character in the ASCII range. If this identifier is a capital letter (smaller than ASCII 95), it means that it's a value to be converted via an extension. If so, the next item is a string (see below for its encoding) representing the extension name. Next is the data itself. All words are stored in little endian format.
Encoding of size¶
Sizes (of e.g. lists, mappings, strings, and blobs) are encoded as follows: if the size is smaller than 251, a single byte (uint8) is used. Otherwise, the first byte is 253, and the next 8 bytes represent the size using an unsigned 64bit integer. (The bytes 254 and 255 are used to identify (closed and unclosed) streams, and 251-252 are reserved.)
Header¶
Data encoded with BSDF starts with the following 6-byte header:
- 4 Identifier bytes: ASCII
BSDF
, equivalent to 1178882882 little endian. - Two variable size unsigned integers (uint8 in practice, assuming version numbers smaller than 251) indicating the major and minor version numbers.
null¶
The value null/nil/none is identified by v
(for void), and has no data.
booleans¶
The values false and true are identified by n
for no, and y
for yes,
respectively. These values have no data.
integers¶
Integer values come it two flavours:
h
: small values (between -32768 and 32768, inclusive) can be encoded using int16.i
: int64
floats¶
Floats values follow the IEEE 754 standard, can be NaN
and inf
and
come in two flavours:
f
: a 32bit floatd
: a 64bit float
strings¶
String values are identified by s
(for string), and consists of a
size item (1 or 9 bytes), followed by the bytes that represents the
UTF-8 encoded string.
blobs¶
Binary data is encoded as follows:
- char
b
(for blob) - uint8 value indicating the compression. 0 means no compression, 1 means zlib, 2 means bz2.
- allocated_size: the amount of space allocated for the blob, in bytes.
- used_size: the amount of used space for the blob, in bytes.
- data_size: the size of the blob when decompressed, in bytes. If compression is off, it must be equal to used_size.
- checksum: a single byte
0x00
means no hash, a byte0xFF
means that there is, and is followed by a 16-byte md5 hash of the used (compressed) bytes. - Byte alignment indicator: a uint8 number indicating the number of bytes to skip before the data starts. Implementations must align the data to 8-byte boundaries, but larger boundaries (up to 256) are allowed.
- Empty space: a number of empty bytes, as indicated by the byte alignment indicator.
- The binary blob, used_size bytes.
- Empty space, allocated_size minus used_size bytes. This space may have been caused by a reducion of size of the blob, or may be allocated to allow increasing the size of the blob.
Note: at this moment, some implementations can write checksums, but none actually use it to validate the data. A policy w.r.t. checksums will have to be made and implementations will have to implement this.
lists¶
List values consist of the identifier l
(for list), followed by a size item that
represents the length of the list n. After that, n values follow, which
can be of any type.
mappings¶
Mappings, a.k.a. dictionaries or structs, consists of the identifier
m
(for mapping), followed by a size item that represents the length
of the mapping n. After that, n items follow, each time a combination
of a string (the key) and the value.
Streaming¶
Streams allow data to be written and read in a "lazy" fashion. Implementations are not required to support streaming itself, but must be able to read data that contains (closed and unclosed) streams.
Data that is "streaming" must always be the last object in the file (except for its sub items). BSDF currently specifies that streaming is only supported for lists. It will likely also be supported for blobs.
Streams are identified by the size encoding which starts with 254 or 255, followed by an unsigned 64 bit integer. For closed streams (254), the integer represents the number of items in the stream. For unclosed streams (255) the 64 bit integer must be ignored.
Encoder implementations can thus close a stream by changing the 255 to 254 and writing the real size in the next 8 bytes. Alternatively, an implementaion can turn it into a regular encoded list (not streamed) by writing 253 instead. Note that in the latter case the list can not be read as a stream anymore.
BSDF Javascript implementation¶
This implementation of BSDF is intended for use in NodeJS or the browser. It is a "lite" implementation, without support for e.g. lazy loading or streaming.
Usage¶
Basic usage:
var bsdf = require('bsdf.js');
var data1 = ...
var bytes = bsdf.encode(data1); // produces an ArrayBuffer
var data2 = bsdf.decode(bytes); // bytes can be ArrayBuffer, DataView or Uint8Array.
Full example using extensions:
// A class that we want to encode
function MyOb(val) {
this.val = val;
}
// The extension that can encode/decode it
var myext = {name: 'test.myob',
match: function (v) { return v instanceof MyOb; },
encode: function (v) { return v.val; },
decode: function (v) { return new MyOb(v); }
};
// Determine extensions to use (include standard ones)
var extensions = Array.concat(bsdf.standard_extensions, [myext]);
// Encode and decode
var data1 = new MyOb(42);
var bytes = bsdf.encode(data1, extensions);
var data2 = bsdf.decode(bytes); // -> the raw value, 42
var data3 = bsdf.decode(bytes, extensions); // a MyOb instance with value 42
Reference:¶
Function encode(data, extensions)
¶
Encode the data, using the provided extensions (or the standard extensions
if not given). Returns an ArrayBuffer
representing the encoded data.
See BsdfSerializer.encode()
for details.
Function decode(blob, extensions)
¶
Decode the blob, using the provided extensions (or the standard extensions
if not given). Returns the decoded data.
See BsdfSerializer.decode()
for details.
Class BsdfSerializer(extensions)
¶
Provides a BSDF serializer object with a particular set of extension.
Method add_extension(extension)
¶
Add an extension object to the the serializer.
Method remove_extension(extension)
¶
Remove an extension instance (and any extension with the same name).
Method encode(data)
¶
Encode the data and returns an ArrayBuffer
representing the encoded data.
Any ArrayBuffer
and DataView
objects present in the data are interpreted
as byte blobs, while Uint8Array
objects are interpreted as typed arrays.
Method decode(blob)
¶
Decode the blob and returns the decoded data.
Any encoded byte blobs are will be represented using DataView
objects that
provide a view (not a copy) on the input data. These can be mapped to an array
with e.g. a = new Uint8Array(bytes.buffer, bytes.byteOffset, bytes.byteLength)
.
If needed, a copy can be made with a = new Uint8Array(a)
.
Extensions¶
Extensions are represented by objects that have the following attributes:
- a string
name
indicating the identifier of the extension. - a function
match(s, v)
that is called with a serializer object and a value, and should returntrue
if the extension should be used. - a function
encode(s, v)
that converts a value to more primitive objects. - a function
decode(s, v)
that converts primitive objects into the intended form.
BSDF Matlab/Octave implementation¶
This is the implementation of the BSDF format for Matlab/Octave. It's in good shape and well tested. Though it could do with some love from a Matlab expert to optimize the code and/or improve the implementation, e.g. by allowing custom extensions.
Installation¶
Download Bsdf.m and place it in a directory where Matlab can find it, e.g. by doing:
addpath('/path/to/bsdf');
Usage¶
Functionality is provided via a single Bsdf
class:
>> bsdf = Bsdf()
>> b = bsdf.encode({'just some objects', struct('foo', true, 'bar', []), 42.001});
>> size(b)
ans =
48 1
>> bsdf.decode(b)
ans =
{
[1,1] = just some objects
[1,2] =
scalar structure containing the fields:
foo = 1
bar = [](0x0)
[1,3] = 42.001
}
Reference:¶
Class Bsdf()
¶
This class represents the main API to use BSDF in Matlab.
Options (for writing) are provided as object properties:
- compression: the compression for binary blobs, 0 for raw, 1 for zlib (not available in Octave).
- float64: whether to export floats as 64 bit (default) or 32 bit.
- use_checksum: whether to write checksums for binary blobs, not yet implemented.
Method save(filename, data)
¶
Save data to a file.
Method load(filename)
¶
Load data from a file.
Method encode(data)
¶
Serialize data to bytes. Returns a blob of bytes (a uint8 array).
Method decode(blob)
¶
Load data from bytes.
BSDF Python implementation¶
This is the reference implementation of BSDF, with support for streamed reading and writing, and lazy loading of binary blobs. See also the minimal version of BSDF in Python.
Installation¶
Installing via pip
will install bsdf.py
as well as the CLI:
$ pip install bsdf
Alternatively, one can copy bsdf.py to a directory on your PYTHONPATH
.
Copy bsdf_cli.py along to be able to use the CLI.
There are no dependencies except Python 2.7 or Python 3.4+.
Usage¶
Simple use:
import bsdf
# Encode
bb = bsdf.encode(my_object)
# Decode
my_object2 = bsdf.decode(bb)
Example advanced use:
import bsdf
class MyFunctionExtension(bsdf.Extension):
""" An extension that can encode function objects and reload them if the
function is in the global scope.
"""
name = 'my.func'
def match(self, s, f):
return callable(f)
def encode(self, s, f):
return f.__name__
def decode(self, s, name):
return globals()[name] # in reality, one would do a smarter lookup here
# Setup a serializer with extensions and options
serializer = bsdf.BsdfSerializer([MyFunctionExtension],
compression='bz2')
def foo():
print(42)
# Use it
bb = serializer.encode(foo)
foo2 = serializer.decode(bb)
foo2() # print 42
For more examples, see the Python example notebook.
Reference:¶
function encode(ob, extensions=None, **options)
¶
Save (BSDF-encode) the given object to bytes.
See BSDFSerializer
for details on extensions and options.
function decode(bb, extensions=None, **options)
¶
Load a (BSDF-encoded) structure from bytes.
See BSDFSerializer
for details on extensions and options.
function save(f, ob, extensions=None, **options)
¶
Save (BSDF-encode) the given object to the given filename or
file object. SeeBSDFSerializer
for details on extensions and options.
function load(f, extensions=None, **options)
¶
Load a (BSDF-encoded) structure from the given filename or file object.
See BSDFSerializer
for details on extensions and options.
class BsdfSerializer(extensions=None, **options)
¶
Instances of this class represent a BSDF encoder/decoder.
It acts as a placeholder for a set of extensions and encoding/decoding
options. Use this to predefine extensions and options for high
performance encoding/decoding. For general use, see the functions
save()
, encode()
, load()
, and decode()
.
This implementation of BSDF supports streaming lists (keep adding to a list after writing the main file), lazy loading of blobs, and in-place editing of blobs (for streams opened with a+).
Options for encoding:
- compression (int or str):
0
or "no" for no compression (default),1
or "zlib" for Zlib compression (same as zip files and PNG), and2
or "bz2" for Bz2 compression (more compact but slower writing). Note that some BSDF implementations (e.g. JavaScript) may not support compression. - use_checksum (bool): whether to include a checksum with binary blobs.
- float64 (bool): Whether to write floats as 64 bit (default) or 32 bit.
Options for decoding:
- load_streaming (bool): if True, and the final object in the structure was a stream, will make it available as a stream in the decoded object.
- lazy_blob (bool): if True, bytes are represented as Blob objects that can be used to lazily access the data, and also overwrite the data if the file is open in a+ mode.
method add_extension(extension_class)
¶
Add an extension to this serializer instance, which must be a subclass of Extension. Can be used as a decorator.
method remove_extension(name)
¶
Remove a converted by its unique name.
method encode(ob)
¶
Save the given object to bytes.
method save(f, ob)
¶
Write the given object to the given file object.
method decode(bb)
¶
Load the data structure that is BSDF-encoded in the given bytes.
method load(f)
¶
Load a BSDF-encoded object from the given file object.
class Extension()
¶
Base class to implement BSDF extensions for special data types.
Extension classes are provided to the BSDF serializer, which instantiates the class. That way, the extension can be somewhat dynamic: e.g. the NDArrayExtension exposes the ndarray class only when numpy is imported.
A extension instance must have two attributes. These can be attribiutes of
the class, or of the instance set in __init__()
:
- name (str): the name by which encoded values will be identified.
- cls (type): the type (or list of types) to match values with. This is optional, but it makes the encoder select extensions faster.
Further, it needs 3 methods:
match(serializer, value) -> bool
: return whether the extension can convert the given value. The default isisinstance(value, self.cls)
.encode(serializer, value) -> encoded_value
: the function to encode a value to more basic data types.decode(serializer, encoded_value) -> value
: the function to decode an encoded value back to its intended representation.
class ListStream(mode='w')
¶
A streamable list object used for writing or reading. In read mode, it can also be iterated over.
method append(item)
¶
Append an item to the streaming list. The object is immediately serialized and written to the underlying file.
method close(unstream=False)
¶
Close the stream, marking the number of written elements. New
elements may still be appended, but they won't be read during decoding.
If unstream
is False, the stream is turned into a regular list
(not streaming).
method next()
¶
Read and return the next element in the streaming list. Raises StopIteration if the stream is exhausted.
class Blob(bb, compression=0, extra_size=0, use_checksum=False)
¶
Object to represent a blob of bytes. When used to write a BSDF file, it's a wrapper for bytes plus properties such as what compression to apply. When used to read a BSDF file, it can be used to read the data lazily, and also modify the data if reading in 'r+' mode and the blob isn't compressed.
method seek(p)
¶
Seek to the given position (relative to the blob start).
method tell()
¶
Get the current file pointer position (relative to the blob start).
method write(bb)
¶
Write bytes to the blob.
method read(n)
¶
Read n bytes from the blob.
method get_bytes()
¶
Get the contents of the blob as bytes.
method update_checksum()
¶
Reset the blob's checksum if present. Call this after modifying the data.
BSDF Python lite implementation¶
This is a lightweight implementation of BSDF in Python. Fully functional (including support for custom extensions) but no fancy features like lazy loading or streaming. With less than 500 lines of code (including docstrings) this demonstrates how simple a BSDF implementation can be. See also the complete version of BSDF in Python.
Installation¶
Copy bsdf_lite.py to a place where Python can find it. There are no dependencies except Python 3.4+.
Usage¶
import bsdf_lite
# Setup a serializer with extensions and options
serializer = bsdf_lite.BsdfLiteSerializer(compression='bz2')
# Use it
bb = serializer.encode(my_object1)
my_object2 = serializer.decode(bb)
Reference:¶
class BsdfLiteSerializer(extensions=None, **options)
¶
Instances of this class represent a BSDF encoder/decoder.
This is a lite variant of the Python BSDF serializer. It does not support lazy loading or streaming, but is otherwise fully functional, including support for custom extensions.
It acts as a placeholder for a set of extensions and encoding/decoding options. Options for encoding:
- compression (int or str):
0
or "no" for no compression (default),1
or "zlib" for Zlib compression (same as zip files and PNG), and2
or "bz2" for Bz2 compression (more compact but slower writing). Note that some BSDF implementations (e.g. JavaScript) may not support compression. - use_checksum (bool): whether to include a checksum with binary blobs.
- float64 (bool): Whether to write floats as 64 bit (default) or 32 bit.
method add_extension(extension_class)
¶
Add an extension to this serializer instance, which must be a subclass of Extension. Can be used as a decorator.
method remove_extension(name)
¶
Remove a converted by its unique name.
method encode(ob)
¶
Save the given object to bytes.
method save(f, ob)
¶
Write the given object to the given file object.
method decode(bb)
¶
Load the data structure that is BSDF-encoded in the given bytes.
method load(f)
¶
Load a BSDF-encoded object from the given file object.
class Extension()
¶
Base class to implement BSDF extensions for special data types.
Extension classes are provided to the BSDF serializer, which instantiates the class. That way, the extension can be somewhat dynamic: e.g. the NDArrayExtension exposes the ndarray class only when numpy is imported.
A extension instance must have two attributes. These can be attribiutes of
the class, or of the instance set in __init__()
:
- name (str): the name by which encoded values will be identified.
- cls (type): the type (or list of types) to match values with. This is optional, but it makes the encoder select extensions faster.
Further, it needs 3 methods:
match(serializer, value) -> bool
: return whether the extension can convert the given value. The default isisinstance(value, self.cls)
.encode(serializer, value) -> encoded_value
: the function to encode a value to more basic data types.decode(serializer, encoded_value) -> value
: the function to decode an encoded value back to its intended representation.
\ Sort by:\ best rated\ newest\ oldest\
\\
Add a comment\ (markup):
\``code``
, \ code blocks:::
and an indented block after blank line