RWA-python documentation

RWA-python serializes Python datatypes and stores them in HDF5 files.

Code example

In module A referenced in sys.path:

class CustomClass(object):
        def __init__(self, arg=None):
                self.attr = arg

In module B:

from A import CustomClass
from rwa import HDF5Store

# make any complex construct
any_object = CustomClass((CustomClass('a'), dict(b=1)))

# serialize
hdf5 = HDF5Store('my_file.h5', 'w')
hdf5.poke('my object', any_object)
hdf5.close()

# deserialize
hdf5 = HDF5Store('my_file.h5', 'r')
reloaded_object = hdf5.peek('my object')
hdf5.close()

Introduction

With Python3, RWA-python serialization is fully automatic for types with __slots__ defined or such that the __init__ constructor does not require any input argument.

The library generates serialization schemes for most custom types. When deserializing objects, it also looks for and loads the modules where the corresponding types are defined.

If RWA-python complains about a type that cannot be serialized, a partial fix consists of ignoring this datatype:

hdf5_not_storable(type(unserializable_object))

With Python2, the library requires explicit definitions in most cases. In addition, string typing is sometimes problematic. Non-ascii characters should be explicit unicode.

Installation

Python >= 2.7 or >= 3.5 is required.

pip should work just fine:

pip install --user rwa-python

pip install will install some Python dependencies if missing, but you may still need to install the HDF5 reference library.

Explicitly supported datatypes

  • any datatype supported by h5py
  • type
  • sequences and collections including tuple, list, frozenset, set, dict, namedtuple, deque, OrderedDict, Counter, defaultdict and memoryview
  • some pandas datatypes including Index, Int64Index, UInt64Index, Float64Index, RangeIndex, MultiIndex, Categorical, CategoricalIndex, Series, DataFrame and Panel (Panel is supported only with package tables available)
  • in scipy.sparse, types bsr_matrix, coo_matrix, csc_matrix, csr_matrix, dia_matrix, dok_matrix and lil_matrix

The following datatypes are implicitly supported with Python3 and are serialized in Python2 with explicit rules:

  • in scipy.spatial, types Delaunay, ConvexHull and Voronoi

Other datatypes are safely ignored, including built-in and user defined functions, class methods, etc.

Global parameters

Release 0.8 features global parameters that can be accessed with rwa.hdf5.hdf5_service.params or simpler rwa_params.

Currently available parameters are all Pandas-related.

Since tables became an optional dependency, RWA-python features native serialization rules. This can be disabled with rwa_params['pandas.use_tables'] = True.

Other parameters are ‘pandas.index.force_unicode’ and ‘pandas.columns.force_unicode’, true per default to emulate the behaviour of tables.

Known issues

[fixed in 0.8.3] Python2-serialized scipy.spatial.Delaunay can be deserialized in Python3 but not conversely.

pandas.CategoricalIndex support is broken.