zhusuan.sgmcmc¶

class
SGMCMC
¶ Bases:
object
Base class for stochastic gradient MCMC (SGMCMC) algorithms.
SGMCMC is a class of MCMC algorithms which utilize stochastic gradients instead of the true gradients. To deal with the problems brought by stochasticity in gradients, more sophisticated updating scheme, such as SGHMC and SGNHT, were proposed. We provided four SGMCMC algorithms here: SGLD, PSGLD, SGHMC and SGNHT. For SGHMC and SGNHT, we support 2ndorder integrators introduced in (Chen et al., 2015).
The implementation framework is similar to that of
HMC
class. However, SGMCMC algorithms do not include Metropolis update, and typically do not include hyperparameter adaptation.The usage is the same as that of
HMC
class. Running multiple SGMCMC chains in parallel is supported.To use the sampler, the user first defines the sampling method and corresponding hyperparameters by calling the subclass
SGLD
,PSGLD
,SGHMC
orSGNHT
. Then the user creates a (list of) tensorflow Variable storing the initial sample, whose shape ischain axes + data axes
. There can be arbitrary number of chain axes followed by arbitrary number of data axes. Then the user provides a log_joint function which returns a tensor of shapechain axes
, which is the log joint density for each chain. Alternatively, the user can also provide a meta_bn instance as a description of log_joint. Then the user runs the operation returned bysample()
, which updates the sample stored in the Variable.The typical code for SGMCMC inference is like:
sgmcmc = zs.SGHMC(learning_rate=2e6, friction=0.2, n_iter_resample_v=1000, second_order=True) sample_op, sgmcmc_info = sgmcmc.make_grad_func(meta_bn, observed={'x': x, 'y': y}, latent={'w1': w1, 'w2': w2}) with tf.Session() as sess: for _ in range(n_iters): _, info = sess.run([sample_op, sgmcmc_info], feed_dict=...) print("mean_k", info["mean_k"]) # For SGHMC and SGNHT, # optional
After getting the sample_op, the user can feed minibatches to a data placeholder observed so that the gradient is a stochastic gradient. Then the user runs the sample_op like using HMC.

sample
(meta_bn, observed, latent)¶ Return the sampling Operation that runs a SGMCMC iteration and the statistics collected during it, given the log joint function (or a
MetaBayesianNet
instance), observed values and latent variables.Parameters:  meta_bn – A function or a
MetaBayesianNet
instance. If it is a function, it accepts a dictionary argument of(string, Tensor)
pairs, which are mappings from all StochasticTensor names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model. More conveniently, the user can also provide aMetaBayesianNet
instance instead of directly providing a log_joint function. Then a log_joint function will be created so that log_joint(obs) = meta_bn.observe(**obs).log_joint().  observed – A dictionary of
(string, Tensor)
pairs. Mapping from names of observed StochasticTensor s to their values.  latent – A dictionary of
(string, Variable)
pairs. Mapping from names of latent StochasticTensor s to corresponding tensorflow Variables for storing their initial values and samples.
Returns: A Tensorflow Operation that runs a SGMCMC iteration, called sample_op.
Returns: A namedtuple that records some useful values, called sgmcmc_info. Suppose the list of keys of latent dictionary is
['w1', 'w2']
. Then the typical structure of sgmcmc_info isSGMCMCInfo(attr1={'w1': some value, 'w2': some value}, attr2={'w1': some value, 'w2': some value}, ...)
. Hence,sgmcmc_info.attr1
is a dictionary containing the quantity attr1 corresponding to each latent variable in the latent dictionary.sgmcmc_info returned by any SGMCMC algorithm has an attribute q, representing the updated values of latent variables. To check out other attributes, see the documentation for the specific subclass below.
 meta_bn – A function or a


class
SGLD
(learning_rate)¶ Bases:
zhusuan.sgmcmc.SGMCMC
Subclass of SGMCMC which implements Stochastic Gradient Langevin Dynamics (Welling & Teh, 2011) (SGLD) update. The updating equation implemented below follows Equation (3) in the paper.
Attributes of returned sgmcmc_info in
SGMCMC.sample()
: q  The updated values of latent variables.
Parameters: learning_rate – A 0D float32 Tensor. It can be either a constant or a placeholder for decaying learning rate.

class
PSGLD
(learning_rate, preconditioner='rms', preconditioner_hparams=None)¶ Bases:
zhusuan.sgmcmc.SGLD
Subclass of SGLD implementing preconditioned stochastic gradient Langevin dynamics, a variant proposed in (Li et al, 2015). We implement the RMSprop preconditioner (Equation (45) in the paper). Other preconditioners can be implemented similarly.
Attributes of returned sgmcmc_info in
SGMCMC.sample()
: q  The updated values of latent variables.
Parameters: learning_rate – A 0D float32 Tensor. It can be either a constant or a placeholder for decaying learning rate.

class
SGHMC
(learning_rate, friction=0.25, variance_estimate=0.0, n_iter_resample_v=20, second_order=True)¶ Bases:
zhusuan.sgmcmc.SGMCMC
Subclass of SGMCMC which implements Stochastic Gradient Hamiltonian Monte Carlo (Chen et al., 2014) (SGHMC) update. Compared to SGLD, it adds a momentum variable to the dynamics. Compared to naive HMC using stochastic gradient which diverges, SGHMC simultanenously adds (often the same amount of) friction and noise to make the dynamics have a stationary distribution. The updating equation implemented below follows Equation (15) in the paper. A 2ndorder integrator introduced in (Chen et al., 2015) is supported.
In the following description, we refer to Eq.(*) as Equation (15) in the SGHMC paper.
Attributes of returned sgmcmc_info in
SGMCMC.sample()
: q  The updated values of latent variables.
 mean_k  The mean kinetic energy of updated momentum variables corresponding to the latent variables. Each item is a scalar.
Parameters:  learning_rate – A 0D float32 Tensor corresponding to \(\eta\)
in Eq.(*). Note that it does not scale the same as learning_rate in
SGLD
since \(\eta=O(\epsilon^2)\) in Eq.(*) where \(\epsilon\) is the step size. When NaN occurs, please consider decreasing learning_rate.  friction – A 0D float32 Tensor corresponding to \(\alpha\) in Eq.(*). A coefficient which simultaneously decays the momentum and adds an additional noise (hence here the name friction is not accurate). Larger friction makes the stationary distribution closer to the true posterior since it reduces the effect of stochasticity in the gradient, but slowers mixing of the MCMC chain.
 variance_estimate – A 0D float32 Tensor corresponding to \(\beta\) in Eq.(*). Just set it to zero if it is hard to estimate the gradient variance well. Note that variance_estimate must be smaller than friction.
 n_iter_resample_v – A 0D int32 Tensor. Each n_iter_resample_v
calls to the sampling operation, the momentum variable will be
resampled from the corresponding normal distribution once. Smaller
n_iter_resample_v may lead to a stationary distribution closer to the
true posterior but slowers mixing. If you do not want the momentum
variable resampled, set the parameter to
None
or 0.  second_order – A bool Tensor indicating whether to enable the 2ndorder integrator introduced in (Chen et al., 2015) or to use the ordinary 1storder integrator.

class
SGNHT
(learning_rate, variance_extra=0.0, tune_rate=1.0, n_iter_resample_v=None, second_order=True, use_vector_alpha=True)¶ Bases:
zhusuan.sgmcmc.SGMCMC
Subclass of SGMCMC which implements Stochastic Gradient NoséHoover Thermostat (Ding et al., 2014) (SGNHT) update. It is built upon SGHMC, and it could tune the friction parameter \(\alpha\) in SGHMC automatically (here is an abuse of notation: in SGNHT \(\alpha\) only refers to the friction coefficient, and the noise term is independent of it (unlike SGHMC)), i.e. it adds a new friction variable to the dynamics. The updating equation implemented below follows Algorithm 2 in the supplementary material of the paper. A 2ndorder integrator introduced in (Chen et al., 2015) is supported.
In the following description, we refer to Eq.(**) as the equation in Algorithm 2 in the SGNHT paper.
Attributes of returned sgmcmc_info in
SGMCMC.sample()
: q  The updated values of latent variables.
 mean_k  The mean kinetic energy of updated momentum variables corresponding to the latent variables. If use_vector_alpha==True, each item has the same shape as the corresponding latent variable; else, each item is a scalar.
 alpha  The values of friction variables \(\alpha\) corresponding to the latent variables. If use_vector_alpha==True, each item has the same shape as the corresponding latent variable; else, each item is a scalar.
Parameters:  learning_rate – A 0D float32 Tensor corresponding to \(\eta\)
in Eq.(**). Note that it does not scale the same as learning_rate in
SGLD
since \(\eta=O(\epsilon^2)\) in Eq.(*) where \(\epsilon\) is the step size. When NaN occurs, please consider decreasing learning_rate.  variance_extra – A 0D float32 Tensor corresponding to \(a\) in Eq.(**), representing the additional noise added in the update (and the initial friction \(\alpha\) will be set to this value). Normally just set it to zero.
 tune_rate – A 0D float32 Tensor. In Eq.(**), this parameter is not present (i.e. its value is implicitly set to 1), but a non1 value is also valid. Higher tune_rate represents higher (multiplicative) rate of tuning the friction \(\alpha\).
 n_iter_resample_v – A 0D int32 Tensor. Each n_iter_resample_v
calls to the sampling operation, the momentum variable will be
resampled from the corresponding normal distribution once. Smaller
n_iter_resample_v may lead to a stationary distribution closer to the
true posterior but slowers mixing. If you do not want the momentum
variable resampled, set the parameter to
None
or 0.  second_order – A bool Tensor indicating whether to enable the 2ndorder integrator introduced in (Chen et al., 2015) or to use the ordinary 1storder integrator.
 use_vector_alpha – A bool Tensor indicating whether to use a vector friction \(\alpha\). If it is true, then the friction has the same shape as the latent variable. That is, each component of the latent variable corresponds to an independently tunable friction. Else, the friction is a scalar.