zhusuan.variational¶
Base class¶

class
VariationalObjective
(meta_bn, observed, latent=None, variational=None)¶ Bases:
zhusuan.utils.TensorArithmeticMixin
The base class for variational objectives. You never use this class directly, but instead instantiate one of its subclasses by calling
elbo()
,importance_weighted_objective()
, orklpq()
.Parameters:  meta_bn – A
MetaBayesianNet
instance or a log joint probability function. For the latter, it must accepts a dictionary argument of(string, Tensor)
pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.  observed – A dictionary of
(string, Tensor)
pairs. Mapping from names of observed stochastic nodes to their values.  latent – A dictionary of
(string, (Tensor, Tensor))
pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.  variational – A
BayesianNet
instance that defines the variational family. variational and latent are mutually exclusive.

bn
¶ The
BayesianNet
constructed by observing themeta_bn
with samples from the variational posterior distributions.None
if the log joint probability function is provided instead ofmeta_bn
.Note
This
BayesianNet
instance is useful when computing predictions with the approximate posterior distribution.

meta_bn
¶ The inferred model. A
MetaBayesianNet
instance.None
if instead log joint probability function is given.

tensor
¶ Return the Tensor representing the value of the variational objective.

variational
¶ The variational family. A
BayesianNet
instance.None
if instead latent is given.
 meta_bn – A
Exclusive KL divergence¶

elbo
(meta_bn, observed, latent=None, axis=None, variational=None)¶ The evidence lower bound (ELBO) objective for variational inference. The returned value is a
EvidenceLowerBoundObjective
instance.See
EvidenceLowerBoundObjective
for examples of usage.Parameters:  meta_bn – A
MetaBayesianNet
instance or a log joint probability function. For the latter, it must accepts a dictionary argument of(string, Tensor)
pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.  observed – A dictionary of
(string, Tensor)
pairs. Mapping from names of observed stochastic nodes to their values.  latent – A dictionary of
(string, (Tensor, Tensor))
pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.  axis – The sample dimension(s) to reduce when computing the
outer expectation in the objective. If
None
, no dimension is reduced.  variational – A
BayesianNet
instance that defines the variational family. variational and latent are mutually exclusive.
Returns: An
EvidenceLowerBoundObjective
instance. meta_bn – A

class
EvidenceLowerBoundObjective
(meta_bn, observed, latent=None, axis=None, variational=None)¶ Bases:
zhusuan.variational.base.VariationalObjective
The class that represents the evidence lower bound (ELBO) objective for variational inference. An instance of the class can be constructed by calling
elbo()
:# lower_bound is an EvidenceLowerBoundObjective instance lower_bound = zs.variational.elbo( meta_bn, observed, variational=variational, axis=0)
Here
meta_bn
is aMetaBayesianNet
instance representing the model to be inferred.variational
is aBayesianNet
instance that defines the variational family.axis
is the index of the sample dimension used to estimate the expectation when computing the objective.Instances of
EvidenceLowerBoundObjective
are Tensorlike. They can be automatically or manually cast into Tensors when fed into Tensorflow Operators and doing computation with Tensors, or when thetensor
property is accessed. It can also be evaluated like a Tensor:# evaluate the ELBO with tf.Session() as sess: print sess.run(lower_bound, feed_dict=...)
Maximizing the ELBO wrt. variational parameters is equivalent to minimizing \(KL(q\p)\), i.e., the KLdivergence between the variational posterior (\(q\)) and the true posterior (\(p\)). However, this cannot be directly done by calling Tensorflow optimizers on the
EvidenceLowerBoundObjective
instance because of the outer expectation in the true ELBO objective, while our ELBO value at hand is a single or a few sample estimates. The correct way for doing this is by calling the gradient estimator provided byEvidenceLowerBoundObjective
. Currently there are two of them:sgvb()
: The Stochastic Gradient Variational Bayes (SGVB) estimator, also known as “the reparameterization trick”, or “path derivative estimator”.reinforce()
: The score function estimator with variance reduction, also known as “REINFORCE”, “NVIL”, or “likelihoodratio estimator”.
Thus the typical code for doing variational inference is like:
# choose a gradient estimator to return the surrogate cost cost = lower_bound.sgvb() # or # cost = lower_bound.reinforce() # optimize the surrogate cost wrt. variational parameters optimizer = tf.train.AdamOptimizer(learning_rate) infer_op = optimizer.minimize(cost, var_list=variational_parameters) with tf.Session() as sess: for _ in range(n_iters): _, lb = sess.run([infer_op, lower_bound], feed_dict=...)
Note
Don’t directly optimize the
EvidenceLowerBoundObjective
instance wrt. variational parameters, i.e., parameters in \(q\). Instead a proper gradient estimator should be chosen to produce the correct surrogate cost to minimize, as shown in the above code snippet.On the other hand, the ELBO can be used for maximum likelihood learning of model parameters, as it is a lower bound of the marginal log likelihood of observed variables. Because the outer expectation in the ELBO is not related to model parameters, this time it’s fine to directly optimize the class instance:
# optimize wrt. model parameters learn_op = optimizer.minimize(lower_bound, var_list=model_parameters) # or # learn_op = optimizer.minimize(cost, var_list=model_parameters) # both ways are correct
Or we can do inference and learning jointly by optimize over both variational and model parameters:
# joint inference and learning infer_and_learn_op = optimizer.minimize( cost, var_list=model_and_variational_parameters)
Parameters:  meta_bn – A
MetaBayesianNet
instance or a log joint probability function. For the latter, it must accepts a dictionary argument of(string, Tensor)
pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.  observed – A dictionary of
(string, Tensor)
pairs. Mapping from names of observed stochastic nodes to their values.  latent – A dictionary of
(string, (Tensor, Tensor))
pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.  axis – The sample dimension(s) to reduce when computing the
outer expectation in the objective. If
None
, no dimension is reduced.  variational – A
BayesianNet
instance that defines the variational family. variational and latent are mutually exclusive.

bn
¶ The
BayesianNet
constructed by observing themeta_bn
with samples from the variational posterior distributions.None
if the log joint probability function is provided instead ofmeta_bn
.Note
This
BayesianNet
instance is useful when computing predictions with the approximate posterior distribution.

meta_bn
¶ The inferred model. A
MetaBayesianNet
instance.None
if instead log joint probability function is given.

reinforce
(variance_reduction=True, baseline=None, decay=0.8)¶ Implements the score function gradient estimator for the ELBO, with optional variance reduction using moving mean estimate or “baseline”. Also known as “REINFORCE” (Williams, 1992), “NVIL” (Mnih, 2014), and “likelihoodratio estimator” (Glynn, 1990).
It works for all types of latent StochasticTensor s.
Note
To use the
reinforce()
estimator, theis_reparameterized
property of each reparameterizable latent StochasticTensor must be set False.Parameters:  variance_reduction – Bool. Whether to use variance reduction. By default will subtract the learning signal with a moving mean estimation of it. Users can pass an additional customized baseline using the baseline argument, in that way the returned will be a tuple of costs, the former for the gradient estimator, the latter for adapting the baseline.
 baseline – A Tensor that can broadcast to match the shape returned by log_joint. A trainable estimation for the scale of the elbo value, which is typically dependent on observed values, e.g., a neural network with observed values as inputs. This will be additional.
 decay – Float. The moving average decay for variance normalization.
Returns: A Tensor. The surrogate cost for Tensorflow optimizers to minimize.

sgvb
()¶ Implements the stochastic gradient variational bayes (SGVB) gradient estimator for the ELBO, also known as “reparameterization trick” or “path derivative estimator”.
It only works for latent StochasticTensor s that can be reparameterized (Kingma, 2013). For example,
Normal
andConcrete
.Note
To use the
sgvb()
estimator, theis_reparameterized
property of each latent StochasticTensor must be True (which is the default setting when they are constructed).Returns: A Tensor. The surrogate cost for Tensorflow optimizers to minimize.

tensor
¶ Return the Tensor representing the value of the variational objective.

variational
¶ The variational family. A
BayesianNet
instance.None
if instead latent is given.
Inclusive KL divergence¶

klpq
(meta_bn, observed, latent=None, axis=None, variational=None)¶ The inclusive KL objective for variational inference. The returned value is an
InclusiveKLObjective
instance.See
InclusiveKLObjective
for examples of usage.Parameters:  meta_bn – A
MetaBayesianNet
instance or a log joint probability function. For the latter, it must accepts a dictionary argument of(string, Tensor)
pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.  observed – A dictionary of
(string, Tensor)
pairs. Mapping from names of observed stochastic nodes to their values.  latent – A dictionary of
(string, (Tensor, Tensor))
pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.  axis – The sample dimension(s) to reduce when computing the
outer expectation in the objective. If
None
, no dimension is reduced.  variational – A
BayesianNet
instance that defines the variational family. variational and latent are mutually exclusive.
Returns: An
InclusiveKLObjective
instance. meta_bn – A

class
InclusiveKLObjective
(meta_bn, observed, latent=None, axis=None, variational=None)¶ Bases:
zhusuan.variational.base.VariationalObjective
The class that represents the inclusive KL objective (\(KL(p\q)\), i.e., the KLdivergence between the true posterior \(p\) and the variational posterior \(q\)). This is the opposite direction of the one (\(KL(q\p)\), or exclusive KL objective) that induces the ELBO objective.
An instance of the class can be constructed by calling
klpq()
:# klpq_obj is an InclusiveKLObjective instance klpq_obj = zs.variational.klpq( meta_bn, observed, variational=variational, axis=axis)
Here
meta_bn
is aMetaBayesianNet
instance representing the model to be inferred.variational
is aBayesianNet
instance that defines the variational family.axis
is the index of the sample dimension used to estimate the expectation when computing the gradients.Unlike most
VariationalObjective
instances, the instance ofInclusiveKLObjective
cannot be used like a Tensor or evaluated, because in general this objective is not computable.The only thing one could achieve with this objective is purely for inference, i.e., optimize it wrt. variational parameters (parameters in \(q\)). The way to perform this is by calling the supported gradient estimator and getting the surrogate cost to minimize. Currently there is
importance()
: The selfnormalized importance sampling gradient estimator.
So the typical code for doing variational inference is like:
# call the gradient estimator to return the surrogate cost cost = klpq_obj.importance() # optimize the surrogate cost wrt. variational parameters optimizer = tf.train.AdamOptimizer(learning_rate) infer_op = optimizer.minimize(cost, var_list=variational_parameters) with tf.Session() as sess: for _ in range(n_iters): _, lb = sess.run([infer_op, lower_bound], feed_dict=...)
Note
The inclusive KL objective is only a criteria for variational inference but not model learning (Optimizing it doesn’t do maximum likelihood learning like the ELBO objective does). That means, there is no reason to optimize the surrogate cost wrt. model parameters.
Parameters:  meta_bn – A
MetaBayesianNet
instance or a log joint probability function. For the latter, it must accepts a dictionary argument of(string, Tensor)
pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.  observed – A dictionary of
(string, Tensor)
pairs. Mapping from names of observed stochastic nodes to their values.  latent – A dictionary of
(string, (Tensor, Tensor))
pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.  axis – The sample dimension(s) to reduce when computing the
outer expectation in the objective. If
None
, no dimension is reduced.  variational – A
BayesianNet
instance that defines the variational family. variational and latent are mutually exclusive.

bn
¶ The
BayesianNet
constructed by observing themeta_bn
with samples from the variational posterior distributions.None
if the log joint probability function is provided instead ofmeta_bn
.Note
This
BayesianNet
instance is useful when computing predictions with the approximate posterior distribution.

importance
()¶ Implements the selfnormalized importance sampling gradient estimator for variational inference. This was used in the Reweighted WakeSleep (RWS) algorithm (Bornschein, 2015) to adapt the proposal, or variational posterior in the importance weighted objective (See
ImportanceWeightedObjective
). Now this estimator is widely used for neural adaptive proposals in importance sampling.It works for all types of latent StochasticTensor s.
Note
To use the
rws()
estimator, theis_reparameterized
property of each reparameterizable latent StochasticTensor must be set False.Returns: A Tensor. The surrogate cost for Tensorflow optimizers to minimize.

meta_bn
¶ The inferred model. A
MetaBayesianNet
instance.None
if instead log joint probability function is given.

rws
()¶ (Deprecated) Alias of
importance()
.

tensor
¶ Return the Tensor representing the value of the variational objective.

variational
¶ The variational family. A
BayesianNet
instance.None
if instead latent is given.
Monte Carlo objectives¶

importance_weighted_objective
(meta_bn, observed, latent=None, axis=None, variational=None)¶ The importance weighted objective for variational inference (Burda, 2015). The returned value is an
ImportanceWeightedObjective
instance.See
ImportanceWeightedObjective
for examples of usage.Parameters:  meta_bn – A
MetaBayesianNet
instance or a log joint probability function. For the latter, it must accepts a dictionary argument of(string, Tensor)
pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.  observed – A dictionary of
(string, Tensor)
pairs. Mapping from names of observed stochastic nodes to their values.  latent – A dictionary of
(string, (Tensor, Tensor))
pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.  axis – The sample dimension(s) to reduce when computing the
outer expectation in the objective. If
None
, no dimension is reduced.  variational – A
BayesianNet
instance that defines the variational family. variational and latent are mutually exclusive.
Returns: An
ImportanceWeightedObjective
instance. meta_bn – A

iw_objective
(meta_bn, observed, latent=None, axis=None, variational=None)¶ The importance weighted objective for variational inference (Burda, 2015). The returned value is an
ImportanceWeightedObjective
instance.See
ImportanceWeightedObjective
for examples of usage.Parameters:  meta_bn – A
MetaBayesianNet
instance or a log joint probability function. For the latter, it must accepts a dictionary argument of(string, Tensor)
pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.  observed – A dictionary of
(string, Tensor)
pairs. Mapping from names of observed stochastic nodes to their values.  latent – A dictionary of
(string, (Tensor, Tensor))
pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.  axis – The sample dimension(s) to reduce when computing the
outer expectation in the objective. If
None
, no dimension is reduced.  variational – A
BayesianNet
instance that defines the variational family. variational and latent are mutually exclusive.
Returns: An
ImportanceWeightedObjective
instance. meta_bn – A

class
ImportanceWeightedObjective
(meta_bn, observed, latent=None, axis=None, variational=None)¶ Bases:
zhusuan.variational.base.VariationalObjective
The class that represents the importance weighted objective for variational inference (Burda, 2015). An instance of the class can be constructed by calling
importance_weighted_objective()
:# lower_bound is an ImportanceWeightedObjective instance lower_bound = zs.variational.importance_weighted_objective( meta_bn, observed, variational=variational, axis=axis)
Here
meta_bn
is aMetaBayesianNet
instance representing the model to be inferred.variational
is aBayesianNet
instance that defines the variational family.axis
is the index of the sample dimension used to estimate the expectation when computing the objective.Instances of
ImportanceWeightedObjective
are Tensorlike. They can be automatically or manually cast into Tensors when fed into Tensorflow operations and doing computation with Tensors, or when thetensor
property is accessed. It can also be evaluated like a Tensor:# evaluate the objective with tf.Session() as sess: print sess.run(lower_bound, feed_dict=...)
The objective computes the same importancesampling based estimate of the marginal log likelihood of observed variables as
is_loglikelihood()
. The difference is that the estimate now serves as a variational objective, since it is also a lower bound of the marginal log likelihood (as long as the number of samples is finite). The variational posterior here is in fact the proposal. As a variational objective,ImportanceWeightedObjective
provides two gradient estimators for the variational (proposal) parameters:sgvb()
: The Stochastic Gradient Variational Bayes (SGVB) estimator, also known as “the reparameterization trick”, or “path derivative estimator”.vimco()
: The multisample score function estimator with variance reduction, also known as “VIMCO”.
The typical code for joint inference and learning is like:
# choose a gradient estimator to return the surrogate cost cost = lower_bound.sgvb() # or # cost = lower_bound.vimco() # optimize the surrogate cost wrt. model and variational # parameters optimizer = tf.train.AdamOptimizer(learning_rate) infer_and_learn_op = optimizer.minimize( cost, var_list=model_and_variational_parameters) with tf.Session() as sess: for _ in range(n_iters): _, lb = sess.run([infer_op, lower_bound], feed_dict=...)
Note
Don’t directly optimize the
ImportanceWeightedObjective
instance wrt. to variational parameters, i.e., parameters in \(q\). Instead a proper gradient estimator should be chosen to produce the correct surrogate cost to minimize, as shown in the above code snippet.Because the outer expectation in the objective is not related to model parameters, it’s fine to directly optimize the class instance wrt. model parameters:
# optimize wrt. model parameters learn_op = optimizer.minimize(lower_bound, var_list=model_parameters) # or # learn_op = optimizer.minimize(cost, var_list=model_parameters) # both ways are correct
The above provides a way for users to combine the importance weighted objective with different methods of adapting proposals (\(q\)). In this situation the true posterior is a good choice, which indicates that any variational objectives can be used for the adaptation. Specially, when the
klpq()
objective is chosen, this reproduces the Reweighted WakeSleep algorithm (Bornschein, 2015) for learning deep generative models.Parameters:  meta_bn – A
MetaBayesianNet
instance or a log joint probability function. For the latter, it must accepts a dictionary argument of(string, Tensor)
pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.  observed – A dictionary of
(string, Tensor)
pairs. Mapping from names of observed stochastic nodes to their values.  latent – A dictionary of
(string, (Tensor, Tensor))
pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.  axis – The sample dimension(s) to reduce when computing the
outer expectation in the objective. If
None
, no dimension is reduced.  variational – A
BayesianNet
instance that defines the variational family. variational and latent are mutually exclusive.

bn
¶ The
BayesianNet
constructed by observing themeta_bn
with samples from the variational posterior distributions.None
if the log joint probability function is provided instead ofmeta_bn
.Note
This
BayesianNet
instance is useful when computing predictions with the approximate posterior distribution.

meta_bn
¶ The inferred model. A
MetaBayesianNet
instance.None
if instead log joint probability function is given.

sgvb
()¶ Implements the stochastic gradient variational bayes (SGVB) gradient estimator for the objective, also known as “reparameterization trick” or “path derivative estimator”. It was first used for importance weighted objectives in (Burda, 2015), where it’s named “IWAE”.
It only works for latent StochasticTensor s that can be reparameterized (Kingma, 2013). For example,
Normal
andConcrete
.Note
To use the
sgvb()
estimator, theis_reparameterized
property of each latent StochasticTensor must be True (which is the default setting when they are constructed).Returns: A Tensor. The surrogate cost for Tensorflow optimizers to minimize.

tensor
¶ Return the Tensor representing the value of the variational objective.

variational
¶ The variational family. A
BayesianNet
instance.None
if instead latent is given.

vimco
()¶ Implements the multisample score function gradient estimator for the objective, also known as “VIMCO”, which is named by authors of the original paper (Minh, 2016).
It works for all kinds of latent StochasticTensor s.
Note
To use the
vimco()
estimator, theis_reparameterized
property of each reparameterizable latent StochasticTensor must be set False.Returns: A Tensor. The surrogate cost for Tensorflow optimizers to minimize.