zhusuan.sgmcmc

class SGMCMC

Bases: object

Base class for stochastic gradient MCMC (SGMCMC) algorithms.

SGMCMC is a class of MCMC algorithms which utilize stochastic gradients instead of the true gradients. To deal with the problems brought by stochasticity in gradients, more sophisticated updating scheme, such as SGHMC and SGNHT, were proposed. We provided four SGMCMC algorithms here: SGLD, PSGLD, SGHMC and SGNHT. For SGHMC and SGNHT, we support 2nd-order integrators introduced in (Chen et al., 2015).

The implementation framework is similar to that of HMC class. However, SGMCMC algorithms do not include Metropolis update, and typically do not include hyperparameter adaptation.

The usage is the same as that of HMC class. Running multiple SGMCMC chains in parallel is supported.

To use the sampler, the user first defines the sampling method and corresponding hyperparameters by calling the subclass SGLD, PSGLD, SGHMC or SGNHT. Then the user creates a (list of) tensorflow Variable storing the initial sample, whose shape is chain axes + data axes. There can be arbitrary number of chain axes followed by arbitrary number of data axes. Then the user provides a log_joint function which returns a tensor of shape chain axes, which is the log joint density for each chain. Alternatively, the user can also provide a meta_bn instance as a description of log_joint. Then the user runs the operation returned by sample(), which updates the sample stored in the Variable.

The typical code for SGMCMC inference is like:

sgmcmc = zs.SGHMC(learning_rate=2e-6, friction=0.2,
                  n_iter_resample_v=1000, second_order=True)
sample_op, sgmcmc_info = sgmcmc.make_grad_func(meta_bn,
    observed={'x': x, 'y': y}, latent={'w1': w1, 'w2': w2})

with tf.Session() as sess:
    for _ in range(n_iters):
        _, info = sess.run([sample_op, sgmcmc_info],
                              feed_dict=...)
        print("mean_k", info["mean_k"])   # For SGHMC and SGNHT,
                                          # optional

After getting the sample_op, the user can feed mini-batches to a data placeholder observed so that the gradient is a stochastic gradient. Then the user runs the sample_op like using HMC.

sample(meta_bn, observed, latent)

Return the sampling Operation that runs a SGMCMC iteration and the statistics collected during it, given the log joint function (or a MetaBayesianNet instance), observed values and latent variables.

Parameters:
  • meta_bn – A function or a MetaBayesianNet instance. If it is a function, it accepts a dictionary argument of (string, Tensor) pairs, which are mappings from all StochasticTensor names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model. More conveniently, the user can also provide a MetaBayesianNet instance instead of directly providing a log_joint function. Then a log_joint function will be created so that log_joint(obs) = meta_bn.observe(**obs).log_joint().
  • observed – A dictionary of (string, Tensor) pairs. Mapping from names of observed StochasticTensor s to their values.
  • latent – A dictionary of (string, Variable) pairs. Mapping from names of latent StochasticTensor s to corresponding tensorflow Variables for storing their initial values and samples.
Returns:

A Tensorflow Operation that runs a SGMCMC iteration, called sample_op.

Returns:

A namedtuple that records some useful values, called sgmcmc_info. Suppose the list of keys of latent dictionary is ['w1', 'w2']. Then the typical structure of sgmcmc_info is SGMCMCInfo(attr1={'w1': some value, 'w2': some value}, attr2={'w1': some value, 'w2': some value}, ...). Hence, sgmcmc_info.attr1 is a dictionary containing the quantity attr1 corresponding to each latent variable in the latent dictionary.

sgmcmc_info returned by any SGMCMC algorithm has an attribute q, representing the updated values of latent variables. To check out other attributes, see the documentation for the specific subclass below.

class SGLD(learning_rate)

Bases: zhusuan.sgmcmc.SGMCMC

Subclass of SGMCMC which implements Stochastic Gradient Langevin Dynamics (Welling & Teh, 2011) (SGLD) update. The updating equation implemented below follows Equation (3) in the paper.

Attributes of returned sgmcmc_info in SGMCMC.sample():

  • q - The updated values of latent variables.
Parameters:learning_rate – A 0-D float32 Tensor. It can be either a constant or a placeholder for decaying learning rate.
class PSGLD(learning_rate, preconditioner='rms', preconditioner_hparams=None)

Bases: zhusuan.sgmcmc.SGLD

Subclass of SGLD implementing preconditioned stochastic gradient Langevin dynamics, a variant proposed in (Li et al, 2015). We implement the RMSprop preconditioner (Equation (4-5) in the paper). Other preconditioners can be implemented similarly.

Attributes of returned sgmcmc_info in SGMCMC.sample():

  • q - The updated values of latent variables.
Parameters:learning_rate – A 0-D float32 Tensor. It can be either a constant or a placeholder for decaying learning rate.
class RMSPreconditioner
HParams

alias of RMSHParams

default_hps = RMSHParams(decay=0.9, epsilon=0.001)
class SGHMC(learning_rate, friction=0.25, variance_estimate=0.0, n_iter_resample_v=20, second_order=True)

Bases: zhusuan.sgmcmc.SGMCMC

Subclass of SGMCMC which implements Stochastic Gradient Hamiltonian Monte Carlo (Chen et al., 2014) (SGHMC) update. Compared to SGLD, it adds a momentum variable to the dynamics. Compared to naive HMC using stochastic gradient which diverges, SGHMC simultanenously adds (often the same amount of) friction and noise to make the dynamics have a stationary distribution. The updating equation implemented below follows Equation (15) in the paper. A 2nd-order integrator introduced in (Chen et al., 2015) is supported.

In the following description, we refer to Eq.(*) as Equation (15) in the SGHMC paper.

Attributes of returned sgmcmc_info in SGMCMC.sample():

  • q - The updated values of latent variables.
  • mean_k - The mean kinetic energy of updated momentum variables corresponding to the latent variables. Each item is a scalar.
Parameters:
  • learning_rate – A 0-D float32 Tensor corresponding to \(\eta\) in Eq.(*). Note that it does not scale the same as learning_rate in SGLD since \(\eta=O(\epsilon^2)\) in Eq.(*) where \(\epsilon\) is the step size. When NaN occurs, please consider decreasing learning_rate.
  • friction – A 0-D float32 Tensor corresponding to \(\alpha\) in Eq.(*). A coefficient which simultaneously decays the momentum and adds an additional noise (hence here the name friction is not accurate). Larger friction makes the stationary distribution closer to the true posterior since it reduces the effect of stochasticity in the gradient, but slowers mixing of the MCMC chain.
  • variance_estimate – A 0-D float32 Tensor corresponding to \(\beta\) in Eq.(*). Just set it to zero if it is hard to estimate the gradient variance well. Note that variance_estimate must be smaller than friction.
  • n_iter_resample_v – A 0-D int32 Tensor. Each n_iter_resample_v calls to the sampling operation, the momentum variable will be resampled from the corresponding normal distribution once. Smaller n_iter_resample_v may lead to a stationary distribution closer to the true posterior but slowers mixing. If you do not want the momentum variable resampled, set the parameter to None or 0.
  • second_order – A bool Tensor indicating whether to enable the 2nd-order integrator introduced in (Chen et al., 2015) or to use the ordinary 1st-order integrator.
class SGNHT(learning_rate, variance_extra=0.0, tune_rate=1.0, n_iter_resample_v=None, second_order=True, use_vector_alpha=True)

Bases: zhusuan.sgmcmc.SGMCMC

Subclass of SGMCMC which implements Stochastic Gradient Nosé-Hoover Thermostat (Ding et al., 2014) (SGNHT) update. It is built upon SGHMC, and it could tune the friction parameter \(\alpha\) in SGHMC automatically (here is an abuse of notation: in SGNHT \(\alpha\) only refers to the friction coefficient, and the noise term is independent of it (unlike SGHMC)), i.e. it adds a new friction variable to the dynamics. The updating equation implemented below follows Algorithm 2 in the supplementary material of the paper. A 2nd-order integrator introduced in (Chen et al., 2015) is supported.

In the following description, we refer to Eq.(**) as the equation in Algorithm 2 in the SGNHT paper.

Attributes of returned sgmcmc_info in SGMCMC.sample():

  • q - The updated values of latent variables.
  • mean_k - The mean kinetic energy of updated momentum variables corresponding to the latent variables. If use_vector_alpha==True, each item has the same shape as the corresponding latent variable; else, each item is a scalar.
  • alpha - The values of friction variables \(\alpha\) corresponding to the latent variables. If use_vector_alpha==True, each item has the same shape as the corresponding latent variable; else, each item is a scalar.
Parameters:
  • learning_rate – A 0-D float32 Tensor corresponding to \(\eta\) in Eq.(**). Note that it does not scale the same as learning_rate in SGLD since \(\eta=O(\epsilon^2)\) in Eq.(*) where \(\epsilon\) is the step size. When NaN occurs, please consider decreasing learning_rate.
  • variance_extra – A 0-D float32 Tensor corresponding to \(a\) in Eq.(**), representing the additional noise added in the update (and the initial friction \(\alpha\) will be set to this value). Normally just set it to zero.
  • tune_rate – A 0-D float32 Tensor. In Eq.(**), this parameter is not present (i.e. its value is implicitly set to 1), but a non-1 value is also valid. Higher tune_rate represents higher (multiplicative) rate of tuning the friction \(\alpha\).
  • n_iter_resample_v – A 0-D int32 Tensor. Each n_iter_resample_v calls to the sampling operation, the momentum variable will be resampled from the corresponding normal distribution once. Smaller n_iter_resample_v may lead to a stationary distribution closer to the true posterior but slowers mixing. If you do not want the momentum variable resampled, set the parameter to None or 0.
  • second_order – A bool Tensor indicating whether to enable the 2nd-order integrator introduced in (Chen et al., 2015) or to use the ordinary 1st-order integrator.
  • use_vector_alpha – A bool Tensor indicating whether to use a vector friction \(\alpha\). If it is true, then the friction has the same shape as the latent variable. That is, each component of the latent variable corresponds to an independently tunable friction. Else, the friction is a scalar.