lagom¶
inte för mycket och inte för lite, enkelhet är bästnot too much and not too little, simplicity is often the best
lagom is a light PyTorch infrastructure to quickly prototype reinforcement learning algorithms.
lagom balances between the flexibility and the userability when developing reinforcement learning (RL) algorithms. The library is built on top of PyTorch and provides modular tools to quickly prototype RL algorithms. However, we do not go overboard, because going too low level is rather time consuming and prone to potential bugs, while going too high level degrades the flexibility which makes it difficult to try out some crazy ideas.
We are continuously making lagom more ‘self-contained’ to run experiments quickly. Now, it internally supports base classes for multiprocessing (master-worker framework) to parallelize (e.g. experiments and evolution strategies). It also supports hyperparameter search by defining configurations either as grid search or random search.
One of the main pipelines to use lagom can be done as following:
- Define environment and RL agent
- User runner to collect data for agent
- Define algorithm to train agent
- Define experiment and configurations.
A graphical illustration is coming soon.
lagom¶
Agent¶
-
class
lagom.
BaseAgent
(config, env, device, **kwargs)[source]¶ Base class for all agents.
The agent could select an action from some data (e.g. observation) and update itself by defining a certain learning mechanism.
Any agent should subclass this class, e.g. policy-based or value-based.
Parameters: - config (dict) – a dictionary of configurations
- env (Env) – environment object.
- device (Device) – a PyTorch device
- **kwargs – keyword aguments used to specify the agent
-
choose_action
(x, **kwargs)[source]¶ Returns the selected action given the data.
Note
It’s recommended to handle all dtype/device conversions between CPU/GPU or Tensor/Numpy here.
The output is a dictionary containing useful items,
Parameters: - obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
- **kwargs – keyword arguments to specify action selection.
Returns: - a dictionary of action selection output. It contains all useful information (e.g. action,
action_logprob, state_value). This allows the API to be generic and compatible with different kinds of runner and agents.
Return type: dict
-
learn
(D, **kwargs)[source]¶ Defines learning mechanism to update the agent from a batched data.
Parameters: - D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
Trajectory
. - **kwargs – keyword arguments to specify learning mechanism
Returns: a dictionary of learning output. This could contain the loss and other useful metrics.
Return type: dict
- D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
-
class
lagom.
RandomAgent
(config, env, device, **kwargs)[source]¶ A random agent samples action uniformly from action space.
-
choose_action
(x, **kwargs)[source]¶ Returns the selected action given the data.
Note
It’s recommended to handle all dtype/device conversions between CPU/GPU or Tensor/Numpy here.
The output is a dictionary containing useful items,
Parameters: - obs (object) – batched observation returned from the environment. First dimension is treated as batch dimension.
- **kwargs – keyword arguments to specify action selection.
Returns: - a dictionary of action selection output. It contains all useful information (e.g. action,
action_logprob, state_value). This allows the API to be generic and compatible with different kinds of runner and agents.
Return type: dict
-
learn
(D, **kwargs)[source]¶ Defines learning mechanism to update the agent from a batched data.
Parameters: - D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
Trajectory
. - **kwargs – keyword arguments to specify learning mechanism
Returns: a dictionary of learning output. This could contain the loss and other useful metrics.
Return type: dict
- D (list) – a list of batched data to train the agent e.g. in policy gradient, this can be
a list of
-
Data¶
Logger¶
-
class
lagom.
Logger
[source]¶ Log the information in a dictionary.
If a key is logged more than once, then the new value will be appended to a list.
Note
It uses pickle to serialize the data. Empirically,
pickle
is 2x faster thannumpy.save
and other alternatives likeyaml
is too slow andJSON
does not support numpy array.Warning
It is discouraged to store hierarchical structure, e.g. list of dict of list of ndarray. Because pickling such complex and large data structure is extremely slow. Put dictionary only at the topmost level. Large numpy array should be saved separately.
Example:
Default:
>>> logger = Logger() >>> logger('iteration', 1) >>> logger('train_loss', 0.12) >>> logger('iteration', 2) >>> logger('train_loss', 0.11) >>> logger('iteration', 3) >>> logger('train_loss', 0.09) >>> logger OrderedDict([('iteration', [1, 2, 3]), ('train_loss', [0.12, 0.11, 0.09])]) >>> logger.dump() Iteration: [1, 2, 3] Train Loss: [0.12, 0.11, 0.09]
With indentation:
>>> logger.dump(indent=1) Iteration: [1, 2, 3] Train Loss: [0.12, 0.11, 0.09]
With specific keys:
>>> logger.dump(keys=['iteration']) Iteration: [1, 2, 3]
With specific index:
>>> logger.dump(index=0) Iteration: 1 Train Loss: 0.12
With specific list of indices:
>>> logger.dump(index=[0, 2]) Iteration: [1, 3] Train Loss: [0.12, 0.09]
-
__call__
(key, value)[source]¶ Log the information with given key and value.
Note
The key should be semantic and each word is separated by
_
.Parameters: - key (str) – key of the information
- value (object) – value to be logged
-
dump
(keys=None, index=None, indent=0, border='')[source]¶ Dump the loggings to the screen.
Parameters: - keys (list, optional) – a list of selected keys. If
None
, then use all keys. Default:None
- index (int/list, optional) –
the index of logged values. It has following use cases:
scalar
: a specific index. If-1
, then use last element.list
: a list of indicies.None
: all indicies.
- indent (int, optional) – the number of tab indentation. Default:
0
- border (str, optional) – the string to print as header and footer
- keys (list, optional) – a list of selected keys. If
Engine¶
-
class
lagom.
BaseEngine
(config, **kwargs)[source]¶ Base class for all engines.
It defines the training and evaluation process.
-
eval
(n=None, **kwargs)[source]¶ Evaluation process for one iteration.
Note
It is recommended to use
Logger
to store loggings.Note
All parameterized modules should be called .eval() to specify evaluation mode.
Parameters: - n (int, optional) – n-th iteration for evaluation.
- **kwargs – keyword aguments used for logging.
Returns: a dictionary of evluation output
Return type: dict
-
train
(n=None, **kwargs)[source]¶ Training process for one iteration.
Note
It is recommended to use
Logger
to store loggings.Note
All parameterized modules should be called .train() to specify training mode.
Parameters: - n (int, optional) – n-th iteration for training.
- **kwargs – keyword aguments used for logging.
Returns: a dictionary of training output
Return type: dict
-
Runner¶
Evolution Strategies¶
-
class
lagom.
BaseES
[source]¶ Base class for all evolution strategies.
Note
The optimization is treated as minimization. e.g. maximize rewards is equivalent to minimize negative rewards.
Note
For painless parallelization, we highly recommend to use concurrent.futures.ProcessPoolExecutor with a few practical tips.
- Set max_workers argument to control the max parallelization capacity.
- When execution get stuck, try to use
CloudpickleWrapper
to wrap the objective function e.g. particularly for lambda, class methods - Use with ProcessPoolExecutor once to wrap entire iterative ES generations. Because using this internally for each generation, it can slow down the parallelization dramatically due to overheads.
- To reduce overheads further (e.g. PyTorch models, gym environments)
- Recreate such models for each generation will be very expensive.
- Use initializer function for ProcessPoolExecutor
- Within initializer function, define PyTorch models and gym environments as global variables Note that the global variables are defined to each worker independently
- Don’t forget to use with torch.no_grad to increase forward pass speed.
-
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: a list of sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-
class
lagom.
CMAES
(x0, sigma0, opts=None)[source]¶ Implements CMA-ES algorithm.
Note
It is a wrapper of the original CMA-ES implementation.
Parameters: - x0 (list) – initial solution
- sigma0 (list) – initial standard deviation
- opts (dict) – a dictionary of options, e.g. [‘popsize’, ‘seed’]
-
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: a list of sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-
class
lagom.
CEM
(x0, sigma0, opts=None)[source]¶ -
ask
()[source]¶ Sample a set of new candidate solutions.
Returns: a list of sampled candidate solutions Return type: list
-
result
¶ Return a namedtuple of all results for the optimization.
It contains: * xbest: best solution evaluated * fbest: objective function value of the best solution * evals_best: evaluation count when xbest was evaluated * evaluations: evaluations overall done * iterations: number of iterations * xfavorite: distribution mean in “phenotype” space, to be considered as current best estimate of the optimum * stds: effective standard deviations
-
tell
(solutions, function_values)[source]¶ Update the parameters of the population for a new generation based on the values of the objective function evaluated for sampled solutions.
Parameters: - solutions (list/ndarray) – candidate solutions returned from
ask()
- function_values (list) – a list of objective function values evaluated for the sampled solutions.
- solutions (list/ndarray) – candidate solutions returned from
-
lagom.envs¶
-
class
lagom.envs.
RecordEpisodeStatistics
(env, deque_size=100)[source]¶ -
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
Returns: the initial observation. Return type: observation (object)
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the agent Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) Return type: observation (object)
-
-
class
lagom.envs.
NormalizeReward
(env, clip=10.0, gamma=0.99, constant_var=None)[source]¶ -
reset
()[source]¶ Resets the state of the environment and returns an initial observation.
Returns: the initial observation. Return type: observation (object)
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the agent Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) Return type: observation (object)
-
-
class
lagom.envs.
TimeStepEnv
(env)[source]¶ -
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
Returns: the initial observation. Return type: observation (object)
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
Parameters: action (object) – an action provided by the agent Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) Return type: observation (object)
-
lagom.experiment¶
Config¶
-
class
lagom.experiment.
Config
(items, num_sample=1, keep_dict_order=False)[source]¶ Defines a set of configurations for the experiment.
The configuration includes the following possible items:
- Hyperparameters: learning rate, batch size etc.
- Experiment settings: training iterations, logging directory, environment name etc.
All items are stored in a dictionary. It is a good practice to semantically name each item e.g. network.lr indicates the learning rate of the neural network.
For hyperparameter search, we support both grid search (
Grid
) and random search (Sample
).Call
make_configs()
to generate a list of all configurations, each is assigned with a unique ID.note:
For random search over small positive float e.g. learning rate, it is recommended to use log-uniform distribution, i.e. .. math:: \text{logU}(a, b) \sim \exp(U(\log(a), \log(b))) An example: `np.exp(np.random.uniform(low=np.log(low), high=np.log(high)))` Because direct uniform sampling is very `numerically unstable`_.
Warning
The random seeds should not be set here. Instead, it should be handled by
BaseExperimentMaster
andBaseExperimentWorker
.Example:
>>> config = Config({'log.dir': 'some path', 'network.lr': Grid([1e-3, 5e-3]), 'env.id': Grid(['CartPole-v1', 'Ant-v2'])}, num_sample=1, keep_dict_order=False) >>> import pandas as pd >>> print(pd.DataFrame(config.make_configs())) ID env.id log.dir network.lr 0 0 CartPole-v1 some path 0.001 1 1 Ant-v2 some path 0.001 2 2 CartPole-v1 some path 0.005 3 3 Ant-v2 some path 0.005
Parameters: - items (dict) – a dictionary of all configuration items.
- num_sample (int) – number of samples for random configuration items.
If grid search is also provided, then the grid will be repeated
num_sample
of times. - keep_dict_order (bool) – if
True
, then each generated configuration has the same key ordering withitems
.
Run experiment¶
-
lagom.experiment.
run_experiment
(run, config, seeds, log_dir, max_workers, chunksize=1, use_gpu=False, gpu_ids=None)[source]¶ A convenient function to parallelize the experiment (master-worker pipeline).
It is implemented by using concurrent.futures.ProcessPoolExecutor
It automatically creates all subfolders for each pair of configuration and random seed to store the loggings of the experiment. The root folder is given by the user. Then all subfolders for each configuration are created with the name of their job IDs. Under each configuration subfolder, a set subfolders are created for each random seed (the random seed as folder name). Intuitively, an experiment could have following directory structure:
- logs - 0 # ID number - 123 # random seed - 345 - 567 - 1 - 123 - 345 - 567 - 2 - 123 - 345 - 567 - 3 - 123 - 345 - 567 - 4 - 123 - 345 - 567
Parameters: - run (function) – a function that defines an algorithm, it must take the arguments (config, seed, device, logdir)
- config (Config) – a
Config
object defining all configuration settings - seeds (list) – a list of random seeds
- log_dir (str) – a string to indicate the path to store loggings.
- max_workers (int) – argument for ProcessPoolExecutor. if None, then all experiments run serially.
- chunksize (int) – argument for Executor.map()
- use_gpu (bool) – if True, then use CUDA. Otherwise, use CPU.
- gpu_ids (list) – if None, then use all available GPUs. Otherwise, only use the GPU device defined in the list.
lagom.metric: Metrics¶
-
lagom.metric.
bootstrapped_returns
(gamma, rewards, last_V, reach_terminal)[source]¶ Return (discounted) accumulated returns with bootstrapping for a batch of episodic transitions.
Formally, suppose we have all rewards \((r_1, \dots, r_T)\), it computes
\[Q_t = r_t + \gamma r_{t+1} + \dots + \gamma^{T - t} r_T + \gamma^{T - t + 1} V(s_{T+1})\]Note
The state values for terminal states are masked out as zero !
-
lagom.metric.
td0_target
(gamma, rewards, Vs, last_V, reach_terminal)[source]¶ Calculate TD(0) targets of a batch of episodic transitions.
Let \(r_1, r_2, \dots, r_T\) be a list of rewards and let \(V(s_0), V(s_1), \dots, V(s_{T-1}), V(s_{T})\) be a list of state values including a last state value. Let \(\gamma\) be a discounted factor, the TD(0) targets are calculated as follows
\[r_t + \gamma V(s_t), \forall t = 1, 2, \dots, T\]Note
The state values for terminal states are masked out as zero !
-
lagom.metric.
td0_error
(gamma, rewards, Vs, last_V, reach_terminal)[source]¶ Calculate TD(0) errors of a batch of episodic transitions.
Let \(r_1, r_2, \dots, r_T\) be a list of rewards and let \(V(s_0), V(s_1), \dots, V(s_{T-1}), V(s_{T})\) be a list of state values including a last state value. Let \(\gamma\) be a discounted factor, the TD(0) errors are calculated as follows
\[\delta_t = r_{t+1} + \gamma V(s_{t+1}) - V(s_t)\]Note
The state values for terminal states are masked out as zero !
-
lagom.metric.
gae
(gamma, lam, rewards, Vs, last_V, reach_terminal)[source]¶ Calculate the Generalized Advantage Estimation (GAE) of a batch of episodic transitions.
Let \(\delta_t\) be the TD(0) error at time step \(t\), the GAE at time step \(t\) is calculated as follows
\[A_t^{\mathrm{GAE}(\gamma, \lambda)} = \sum_{k=0}^{\infty}(\gamma\lambda)^k \delta_{t + k}\]
lagom.networks: Networks¶
-
class
lagom.networks.
Module
(**kwargs)[source]¶ Wrap PyTorch nn.module to provide more helper functions.
-
from_vec
(x)[source]¶ Set the network parameters from a single flattened vector.
Parameters: x (Tensor) – A single flattened vector of the network parameters with consistent size.
-
load
(f)[source]¶ Load the network parameters from a file.
It complies with the recommended approach for saving a model in PyTorch documentation.
Parameters: f (str) – file path.
-
num_params
¶ Returns the total number of parameters in the neural network.
-
num_trainable_params
¶ Returns the total number of trainable parameters in the neural network.
-
num_untrainable_params
¶ Returns the total number of untrainable parameters in the neural network.
-
save
(f)[source]¶ Save the network parameters to a file.
It complies with the recommended approach for saving a model in PyTorch documentation.
Note
It uses the highest pickle protocol to serialize the network parameters.
Parameters: f (str) – file path.
-
-
lagom.networks.
ortho_init
(module, nonlinearity=None, weight_scale=1.0, constant_bias=0.0)[source]¶ Applies orthogonal initialization for the parameters of a given module.
Parameters: - module (nn.Module) – A module to apply orthogonal initialization over its parameters.
- nonlinearity (str, optional) – Nonlinearity followed by forward pass of the module. When nonlinearity
is not
None
, the gain will be calculated andweight_scale
will be ignored. Default:None
- weight_scale (float, optional) – Scaling factor to initialize the weight. Ignored when
nonlinearity
is notNone
. Default: 1.0 - constant_bias (float, optional) – Constant value to initialize the bias. Default: 0.0
Note
Currently, the only supported
module
are elementary neural network layers, e.g. nn.Linear, nn.Conv2d, nn.LSTM. The submodules are not supported.Example:
>>> a = nn.Linear(2, 3) >>> ortho_init(a)
-
lagom.networks.
linear_lr_scheduler
(optimizer, N, min_lr)[source]¶ Defines a linear learning rate scheduler.
Parameters: - optimizer (Optimizer) – optimizer
- N (int) – maximum bounds for the scheduling iteration e.g. total number of epochs, iterations or time steps.
- min_lr (float) – lower bound of learning rate
-
lagom.networks.
make_fc
(input_dim, hidden_sizes)[source]¶ Returns a ModuleList of fully connected layers.
Note
All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in
BaseNetwork
.Example:
>>> make_fc(3, [4, 5, 6]) ModuleList( (0): Linear(in_features=3, out_features=4, bias=True) (1): Linear(in_features=4, out_features=5, bias=True) (2): Linear(in_features=5, out_features=6, bias=True) )
Parameters: - input_dim (int) – input dimension in the first fully connected layer.
- hidden_sizes (list) – a list of hidden sizes, each for one fully connected layer.
Returns: A ModuleList of fully connected layers.
Return type: nn.ModuleList
-
lagom.networks.
make_cnn
(input_channel, channels, kernels, strides, paddings)[source]¶ Returns a ModuleList of 2D convolution layers.
Note
All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in
BaseNetwork
.Example:
>>> make_cnn(input_channel=3, channels=[16, 32], kernels=[4, 3], strides=[2, 1], paddings=[1, 0]) ModuleList( (0): Conv2d(3, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1)) )
Parameters: - input_channel (int) – input channel in the first convolution layer.
- channels (list) – a list of channels, each for one convolution layer.
- kernels (list) – a list of kernels, each for one convolution layer.
- strides (list) – a list of strides, each for one convolution layer.
- paddings (list) – a list of paddings, each for one convolution layer.
Returns: A ModuleList of 2D convolution layers.
Return type: nn.ModuleList
-
lagom.networks.
make_transposed_cnn
(input_channel, channels, kernels, strides, paddings, output_paddings)[source]¶ Returns a ModuleList of 2D transposed convolution layers.
Note
All submodules can be automatically tracked because it uses nn.ModuleList. One can use this function to generate parameters in
BaseNetwork
.Example:
make_transposed_cnn(input_channel=3, channels=[16, 32], kernels=[4, 3], strides=[2, 1], paddings=[1, 0], output_paddings=[1, 0]) ModuleList( (0): ConvTranspose2d(3, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (1): ConvTranspose2d(16, 32, kernel_size=(3, 3), stride=(1, 1)) )
Parameters: - input_channel (int) – input channel in the first transposed convolution layer.
- channels (list) – a list of channels, each for one transposed convolution layer.
- kernels (list) – a list of kernels, each for one transposed convolution layer.
- strides (list) – a list of strides, each for one transposed convolution layer.
- paddings (list) – a list of paddings, each for one transposed convolution layer.
- output_paddings (list) – a list of output paddings, each for one transposed convolution layer.
Returns: A ModuleList of 2D transposed convolution layers.
Return type: nn.ModuleList
-
class
lagom.networks.
MDNHead
(in_features, out_features, num_density, **kwargs)[source]¶ -
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
loss
(logit_pi, mean, std, target)[source]¶ Calculate the MDN loss function.
The loss function (negative log-likelihood) is defined by:
\[L = -\frac{1}{N}\sum_{n=1}^{N}\ln \left( \sum_{k=1}^{K}\prod_{d=1}^{D} \pi_{k}(x_{n, d}) \mathcal{N}\left( \mu_k(x_{n, d}), \sigma_k(x_{n,d}) \right) \right)\]For better numerical stability, we could use log-scale:
\[L = -\frac{1}{N}\sum_{n=1}^{N}\ln \left( \sum_{k=1}^{K}\exp \left\{ \sum_{d=1}^{D} \ln\pi_{k}(x_{n, d}) + \ln\mathcal{N}\left( \mu_k(x_{n, d}), \sigma_k(x_{n,d}) \right) \right\} \right)\]Note
One should always use the second formula via log-sum-exp trick. The first formula is numerically unstable resulting in +/-
Inf
andNaN
error.The log-sum-exp trick is defined by
\[\log\sum_{i=1}^{N}\exp(x_i) = a + \log\sum_{i=1}^{N}\exp(x_i - a)\]where \(a = \max_i(x_i)\)
Parameters: - logit_pi (Tensor) – the logit of mixing coefficients, shape [N, K, D]
- mean (Tensor) – mean of Gaussian mixtures, shape [N, K, D]
- std (Tensor) – standard deviation of Gaussian mixtures, shape [N, K, D]
- target (Tensor) – target tensor, shape [N, D]
Returns: calculated loss
Return type: Tensor
-
sample
(logit_pi, mean, std, tau=1.0)[source]¶ Sample from Gaussian mixtures using reparameterization trick.
- Firstly sample categorically over mixing coefficients to determine a specific Gaussian
- Then sample from selected Gaussian distribution
Parameters: - logit_pi (Tensor) – the logit of mixing coefficients, shape [N, K, D]
- mean (Tensor) – mean of Gaussian mixtures, shape [N, K, D]
- std (Tensor) – standard deviation of Gaussian mixtures, shape [N, K, D]
- tau (float) – temperature during sampling, it controls uncertainty. * If \(\tau > 1\): increase uncertainty * If \(\tau < 1\): decrease uncertainty
Returns: sampled data with shape [N, D]
Return type: Tensor
-
Recurrent Neural Networks¶
-
class
lagom.networks.
LSTMLayer
(cell, *cell_args)[source]¶ -
forward
(input, state)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
lagom.networks.
StackedLSTM
(num_layers, layer, first_layer_args, other_layer_args)[source]¶ -
forward
(input, states)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
RL components¶
-
class
lagom.networks.
CategoricalHead
(feature_dim, num_action, device, **kwargs)[source]¶ Defines a module for a Categorical (discrete) action distribution.
Example
>>> import torch >>> action_head = CategoricalHead(30, 4, 'cpu') >>> action_head(torch.randn(2, 30)) Categorical(probs: torch.Size([2, 4]))
Parameters: - feature_dim (int) – number of input features
- num_action (int) – number of discrete actions
- device (torch.device) – PyTorch device
- **kwargs – keyword arguments for more specifications.
-
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
class
lagom.networks.
DiagGaussianHead
(feature_dim, action_dim, device, std0, **kwargs)[source]¶ Defines a module for a diagonal Gaussian (continuous) action distribution which the standard deviation is state independent.
The network outputs the mean \(\mu(x)\) and the state independent logarithm of standard deviation \(\log\sigma\) (allowing to optimize in log-space, i.e. both negative and positive).
The standard deviation is obtained by applying exponential function \(\exp(x)\).
Example
>>> import torch >>> action_head = DiagGaussianHead(10, 4, 'cpu', 0.45) >>> action_dist = action_head(torch.randn(2, 10)) >>> action_dist.base_dist Normal(loc: torch.Size([2, 4]), scale: torch.Size([2, 4])) >>> action_dist.base_dist.stddev tensor([[0.4500, 0.4500, 0.4500, 0.4500], [0.4500, 0.4500, 0.4500, 0.4500]], grad_fn=<ExpBackward>)
Parameters: - feature_dim (int) – number of input features
- action_dim (int) – flat dimension of actions
- device (torch.device) – PyTorch device
- std0 (float) – initial standard deviation
- **kwargs – keyword arguments for more specifications.
-
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
lagom.transform: Transformations¶
-
class
lagom.transform.
Describe
(count: int, mean: float, std: float, min: float, max: float, repr_indent: int = 0, repr_prefix: str = None)[source]¶
-
lagom.transform.
interp_curves
(x, y)[source]¶ Piecewise linear interpolation of a discrete set of data points and generate new \(x-y\) values from the interpolated line.
It receives a batch of curves with \(x-y\) values, a global min and max of the x-axis are calculated over the entire batch and new x-axis values are generated to be applied to the interpolation function. Each interpolated curve will share the same values in x-axis.
Note
This is useful for plotting a set of curves with uncertainty bands where each curve has data points at different \(x\) values. To generate such plot, we need the set of \(y\) values with consistent \(x\) values.
Warning
Piecewise linear interpolation often can lead to more realistic uncertainty bands. Do not use polynomial interpolation which the resulting curve can be extremely misleading.
Example:
>>> import matplotlib.pyplot as plt >>> x1 = [4, 5, 7, 13, 20] >>> y1 = [0.25, 0.22, 0.53, 0.37, 0.55] >>> x2 = [2, 4, 6, 7, 9, 11, 15] >>> y2 = [0.03, 0.12, 0.4, 0.2, 0.18, 0.32, 0.39] >>> plt.scatter(x1, y1, c='blue') >>> plt.scatter(x2, y2, c='red') >>> new_x, new_y = interp_curves([x1, x2], [y1, y2], num_point=100) >>> plt.plot(new_x[0], new_y[0], 'blue') >>> plt.plot(new_x[1], new_y[1], 'red')
Parameters: - x (list) – a batch of x values.
- y (list) – a batch of y values.
- num_point (int) – number of points to generate from the interpolated line.
Returns: - a tuple of two lists. A list of interpolated x values (shared for the batch of curves)
and followed by a list of interpolated y values.
Return type: tuple
-
lagom.transform.
geometric_cumsum
(alpha, x)[source]¶ Calculate future accumulated sums for each element in a list with an exponential factor.
Given input data \(x_1, \dots, x_n\) and exponential factor \(\alpha\in [0, 1]\), it returns an array \(y\) with the same length and each element is calculated as following
\[y_i = x_i + \alpha x_{i+1} + \alpha^2 x_{i+2} + \dots + \alpha^{n-i-1}x_{n-1} + \alpha^{n-i}x_{n}\]Note
To gain the optimal runtime speed, we use
scipy.signal.lfilter
Example
>>> geometric_cumsum(0.1, [1, 2, 3, 4]) array([[1.234, 2.34 , 3.4 , 4. ]])
Parameters: - alpha (float) – exponential factor between zero and one.
- x (list) – input data
Returns: calculated data
Return type: ndarray
-
lagom.transform.
explained_variance
(y_true, y_pred, **kwargs)[source]¶ Computes the explained variance regression score.
It involves a fraction of variance that the prediction explains about the ground truth.
Let \(\hat{y}\) be the predicted output and let \(y\) be the ground truth output. Then the explained variance is estimated as follows:
\[\text{EV}(y, \hat{y}) = 1 - \frac{\text{Var}(y - \hat{y})}{\text{Var}(y)}\]The best score is \(1.0\), and lower values are worse. A detailed interpretation is as following:
- \(\text{EV} = 1\): perfect prediction
- \(\text{EV} = 0\): might as well have predicted zero
- \(\text{EV} < 0\): worse than just predicting zero
Note
It calls the function from
scikit-learn
which handles exceptions better e.g. zero division, batch size.Example
>>> explained_variance(y_true=[3, -0.5, 2, 7], y_pred=[2.5, 0.0, 2, 8]) 0.9571734475374732
>>> explained_variance(y_true=[[3, -0.5, 2, 7]], y_pred=[[2.5, 0.0, 2, 8]]) 0.9571734475374732
>>> explained_variance(y_true=[[0.5, 1], [-1, 1], [7, -6]], y_pred=[[0, 2], [-1, 2], [8, -5]]) 0.9838709677419355
>>> explained_variance(y_true=[[0.5, 1], [-1, 10], [7, -6]], y_pred=[[0, 2], [-1, 0.00005], [8, -5]]) 0.6704023148857179
Parameters: - y_true (list) – ground truth output
- y_pred (list) – predicted output
- **kwargs – keyword arguments to specify the estimation of the explained variance.
Returns: estimated explained variance
Return type: float
-
class
lagom.transform.
LinearSchedule
(initial, final, N, start=0)[source]¶ A linear scheduling from an initial to a final value over a certain timesteps, then the final value is fixed constantly afterwards.
Note
This could be useful for following use cases:
- Decay of epsilon-greedy: initialized with \(1.0\) and keep with
start
time steps, then linearly decay tofinal
overN
time steps, and then fixed constantly asfinal
afterwards. - Beta parameter in prioritized experience replay.
Note that for learning rate decay, one should use PyTorch
optim.lr_scheduler
instead.Example
>>> scheduler = LinearSchedule(initial=1.0, final=0.1, N=3, start=0) >>> [scheduler(i) for i in range(6)] [1.0, 0.7, 0.4, 0.1, 0.1, 0.1]
Parameters: - initial (float) – initial value
- final (float) – final value
- N (int) – number of scheduling timesteps
- start (int, optional) – the timestep to start the scheduling. Default: 0
- Decay of epsilon-greedy: initialized with \(1.0\) and keep with
-
lagom.transform.
rank_transform
(x, centered=True)[source]¶ Rank transformation of a vector of values. The rank has the same dimensionality as the vector. Each element in the rank indicates the index of the ascendingly sorted input. i.e.
ranks[i] = k
, it means i-th element in the input is \(k\)-th smallest value.Rank transformation reduce sensitivity to outliers, e.g. in OpenAI ES, gradient computation involves fitness values in the population, if there are outliers (too large fitness), it affects the gradient too much.
Note that a centered rank transformation to the range [-0.5, 0.5] is supported by an option.
Example
>>> rank_transform([3, 14, 1], centered=True) array([ 0. , 0.5, -0.5])
>>> rank_transform([3, 14, 1], centered=False) array([1, 2, 0])
Parameters: - x (list/ndarray) – a vector of values.
- centered (bool, optional) – if
True
, then centered the rank transformation to \([-0.5, 0.5]\). Defualt:True
Returns: an numpy array of ranks of input data
Return type: ndarray
-
class
lagom.transform.
PolyakAverage
(alpha)[source]¶ Keep a running average of a quantity via Polyak averaging.
Compared with estimating mean, it is more sentitive to recent changes.
Parameters: alpha (float) – factor to control the sensitivity to recent changes, in the range [0, 1]. Zero is most sensitive to recent change.
-
class
lagom.transform.
RunningMeanVar
(shape)[source]¶ Estimates sample mean and variance by using Chan’s method.
It supports for both scalar and multi-dimensional data, however, the input is expected to be batched. The first dimension is always treated as batch dimension.
Note
For better precision, we handle the data with np.float64.
Warning
To use estimated moments for standardization, remember to keep the precision np.float64 and calculated as ..math:frac{x - mu}{sqrt{sigma^2 + 10^{-8}}}.
Example
>>> f = RunningMeanVar(shape=()) >>> f([1, 2]) >>> f([3]) >>> f([4]) >>> f.mean 2.499937501562461 >>> f.var 1.2501499923440393
-
__call__
(x)[source]¶ Update the mean and variance given an additional batched data.
Parameters: x (object) – additional batched data.
-
n
¶ Returns the total number of samples so far.
-
-
lagom.transform.
smooth_filter
(x, window_length, polyorder, **kwargs)[source]¶ Smooth a sequence of noisy data points by applying Savitzky–Golay filter. It uses least squares to fit a polynomial with a small sliding window and use this polynomial to estimate the point in the center of the sliding window.
This is useful when a curve is highly noisy, smoothing it out leads to better visualization quality.
Example
>>> import matplotlib.pyplot as plt
>>> x = np.linspace(0, 4*2*np.pi, num=100) >>> y = x*(np.sin(x) + np.random.random(100)*4) >>> y2 = smooth_filter(y, window_length=31, polyorder=10)
>>> plt.plot(x, y) >>> plt.plot(x, y2, 'red')
Parameters: - x (list) – one-dimensional vector of scalar data points of a curve.
- window_length (int) – the length of the filter window
- polyorder (int) – the order of the polynomial used to fit the samples
Returns: an numpy array of smoothed curve data
Return type: ndarray
-
class
lagom.transform.
SegmentTree
(capacity, operation, identity_element)[source]¶ Defines a segment tree data structure.
It can be regarded as regular array, but with two major differences
- Value modification is slower: O(ln(capacity)) instead of O(1)
- Efficient reduce operation over contiguous subarray: O(ln(segment size))
Parameters: - capacity (int) – total number of elements, it must be a power of two.
- operation (lambda) – binary operation forming a group, e.g. sum, min
- identity_element (object) – identity element in the group, e.g. 0 for sum
-
class
lagom.transform.
SumTree
(capacity)[source]¶ Defines the sum tree for storing replay priorities.
Each leaf node contains priority value. Internal nodes maintain the sum of the priorities of all leaf nodes in their subtrees.
-
find_prefixsum_index
(prefixsum)[source]¶ Find the highest index i in the array such that sum(A[0] + A[1] + … + A[i - 1]) <= prefixsum
if array values are probabilities, this function efficiently sample indices according to the discrete probability.
Parameters: prefixsum (float) – prefix sum. Returns: highest index satisfying the prefixsum constraint Return type: int
-
lagom.utils: Utils¶
-
lagom.utils.
set_global_seeds
(seed)[source]¶ Set the seed for generating random numbers.
It sets the following dependencies with the given random seed:
- PyTorch
- Numpy
- Python random
Parameters: seed (int) – a given seed.
-
class
lagom.utils.
Seeder
(init_seed=0)[source]¶ A random seed generator.
Given an initial seed, the seeder can be called continuously to sample a single or a batch of random seeds.
Note
The seeder creates an independent RandomState to generate random numbers. It does not affect the RandomState in
np.random
.Example:
>>> seeder = Seeder(init_seed=0) >>> seeder(size=5) [209652396, 398764591, 924231285, 1478610112, 441365315]
Smart data type converter¶
Use Python multiprocessing library¶
-
class
lagom.utils.
ProcessWorker
(master_conn, worker_conn)[source]¶ Base class for all workers implemented with Python multiprocessing.Process.
It communicates with master via a Pipe connection. The worker is stand-by infinitely waiting for task from master, working and sending back result. When it receives a
close
command, it breaks the infinite loop and close the connection.
-
class
lagom.utils.
ProcessMaster
(worker_class, num_worker)[source]¶ Base class for all masters implemented with Python multiprocessing.Process.
It creates a number of workers each with an individual Process. The communication between master and each worker is via independent Pipe connection. The master assigns tasks to workers. When all tasks are done, it stops all workers and terminate all processes.
Note
If there are more tasks than workers, then tasks will be splitted into chunks. If there are less tasks than workers, then we reduce the number of workers to the number of tasks.
-
assign_tasks
(tasks)[source]¶ Assign a given list of tasks to the workers and return the received results.
Parameters: tasks (list) – a list of tasks Returns: received results Return type: object
-
Serialization¶
-
lagom.utils.
pickle_dump
(obj, f, ext='.pkl')[source]¶ Serialize an object using pickling and save in a file.
Note
It uses cloudpickle instead of pickle to support lambda function and multiprocessing. By default, the highest protocol is used.
Note
Except for pure array object, it is not recommended to use
np.save
because it is often much slower.Parameters: - obj (object) – a serializable object
- f (str/Path) – file path
- ext (str, optional) – file extension. Default: .pkl
-
lagom.utils.
pickle_load
(f)[source]¶ Read a pickled data from a file.
Parameters: f (str/Path) – file path
-
lagom.utils.
yaml_dump
(obj, f, ext='.yml')[source]¶ Serialize a Python object using YAML and save in a file.
Note
YAML is recommended to use for a small dictionary and it is super human-readable. e.g. configuration settings. For saving experiment metrics, it is better to use
pickle_dump()
.Note
Except for pure array object, it is not recommended to use
np.load
because it is often much slower.Parameters: - obj (object) – a serializable object
- f (str/Path) – file path
- ext (str, optional) – file extension. Default: .yml
Misc¶
-
lagom.utils.
color_str
(string, color, bold=False)[source]¶ Returns stylized string with coloring and bolding for printing.
Example:
>>> print(color_str('lagom', 'green', bold=True))
Parameters: - string (str) – input string
- color (str) – color name
- bold (bool, optional) – if
True
, then the string is bolded. Default:False
Returns: stylized string
Return type: out
-
lagom.utils.
timed
(color='green', bold=False)[source]¶ A decorator to print the total time of executing a body function.
Parameters: - color (str, optional) – color name. Default: ‘green’
- bold (bool, optional) – if
True
, then the verbose is bolded. Default:False
lagom.vis: Visualization¶
-
class
lagom.vis.
ImageViewer
(max_width=500)[source]¶ Display an image from an RGB array in an OpenGL window.
Example:
imageviewer = ImageViewer(max_width=500) image = np.asarray(Image.open('x.jpg')) imageviewer(image)
-
class
lagom.vis.
GridImage
(ncol=8, padding=2, pad_value=0)[source]¶ Generate a grid of images. The images can be iteratively added.
Example:
grid = GridImage(ncol=8, padding=5, pad_value=0) a = np.random.randint(0, 255+1, size=[10, 3, 64, 64]) grid.add(a) grid()
Reference:
Parameters: - ncol (int, optional) – Number of images to show in each row of the grid. Final grid size is [N/ncol, ncol]. Default: 8.
- padding (int, optional) – Number of paddings. Default: 2.
- pad_value (float, optional) – Padding value in the range [0, 255]. Black is 0 and white 255. Default: 0