zhusuan.variational¶

Base class¶

class VariationalObjective(meta_bn, observed, latent=None, variational=None)¶

Bases: zhusuan.utils.TensorArithmeticMixin

The base class for variational objectives. You never use this class directly, but instead instantiate one of its subclasses by calling elbo(), importance_weighted_objective(), or klpq().

Parameters:

meta_bn – A MetaBayesianNet instance or a log joint probability function. For the latter, it must accepts a dictionary argument of (string, Tensor) pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.
observed – A dictionary of (string, Tensor) pairs. Mapping from names of observed stochastic nodes to their values.
latent – A dictionary of (string, (Tensor, Tensor)) pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.
variational – A BayesianNet instance that defines the variational family. variational and latent are mutually exclusive.

bn¶: The BayesianNet constructed by observing the meta_bn with samples from the variational posterior distributions. None if the log joint probability function is provided instead of meta_bn.

Note

This BayesianNet instance is useful when computing predictions with the approximate posterior distribution.

meta_bn¶: The inferred model. A MetaBayesianNet instance. None if instead log joint probability function is given.

tensor¶: Return the Tensor representing the value of the variational objective.

variational¶: The variational family. A BayesianNet instance. None if instead latent is given.

Exclusive KL divergence¶

elbo(meta_bn, observed, latent=None, axis=None, variational=None)¶

The evidence lower bound (ELBO) objective for variational inference. The returned value is a EvidenceLowerBoundObjective instance.

See EvidenceLowerBoundObjective for examples of usage.

Parameters:

meta_bn – A MetaBayesianNet instance or a log joint probability function. For the latter, it must accepts a dictionary argument of (string, Tensor) pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.
observed – A dictionary of (string, Tensor) pairs. Mapping from names of observed stochastic nodes to their values.
latent – A dictionary of (string, (Tensor, Tensor)) pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.
axis – The sample dimension(s) to reduce when computing the outer expectation in the objective. If None, no dimension is reduced.
variational – A BayesianNet instance that defines the variational family. variational and latent are mutually exclusive.

Returns:

An EvidenceLowerBoundObjective instance.

class EvidenceLowerBoundObjective(meta_bn, observed, latent=None, axis=None, variational=None)¶

Bases: zhusuan.variational.base.VariationalObjective

The class that represents the evidence lower bound (ELBO) objective for variational inference. An instance of the class can be constructed by calling elbo():

# lower_bound is an EvidenceLowerBoundObjective instance
lower_bound = zs.variational.elbo(
    meta_bn, observed, variational=variational, axis=0)

Here meta_bn is a MetaBayesianNet instance representing the model to be inferred. variational is a BayesianNet instance that defines the variational family. axis is the index of the sample dimension used to estimate the expectation when computing the objective.

Instances of EvidenceLowerBoundObjective are Tensor-like. They can be automatically or manually cast into Tensors when fed into Tensorflow Operators and doing computation with Tensors, or when the tensor property is accessed. It can also be evaluated like a Tensor:

# evaluate the ELBO
with tf.Session() as sess:
    print sess.run(lower_bound, feed_dict=...)

Maximizing the ELBO wrt. variational parameters is equivalent to minimizing \(KL(q\|p)\), i.e., the KL-divergence between the variational posterior (\(q\)) and the true posterior (\(p\)). However, this cannot be directly done by calling Tensorflow optimizers on the EvidenceLowerBoundObjective instance because of the outer expectation in the true ELBO objective, while our ELBO value at hand is a single or a few sample estimates. The correct way for doing this is by calling the gradient estimator provided by EvidenceLowerBoundObjective. Currently there are two of them:

sgvb(): The Stochastic Gradient Variational Bayes (SGVB) estimator, also known as “the reparameterization trick”, or “path derivative estimator”.
reinforce(): The score function estimator with variance reduction, also known as “REINFORCE”, “NVIL”, or “likelihood-ratio estimator”.

Thus the typical code for doing variational inference is like:

# choose a gradient estimator to return the surrogate cost
cost = lower_bound.sgvb()
# or
# cost = lower_bound.reinforce()

# optimize the surrogate cost wrt. variational parameters
optimizer = tf.train.AdamOptimizer(learning_rate)
infer_op = optimizer.minimize(cost, var_list=variational_parameters)
with tf.Session() as sess:
    for _ in range(n_iters):
        _, lb = sess.run([infer_op, lower_bound], feed_dict=...)

Note

Don’t directly optimize the EvidenceLowerBoundObjective instance wrt. variational parameters, i.e., parameters in \(q\). Instead a proper gradient estimator should be chosen to produce the correct surrogate cost to minimize, as shown in the above code snippet.

On the other hand, the ELBO can be used for maximum likelihood learning of model parameters, as it is a lower bound of the marginal log likelihood of observed variables. Because the outer expectation in the ELBO is not related to model parameters, this time it’s fine to directly optimize the class instance:

# optimize wrt. model parameters
learn_op = optimizer.minimize(-lower_bound, var_list=model_parameters)
# or
# learn_op = optimizer.minimize(cost, var_list=model_parameters)
# both ways are correct

Or we can do inference and learning jointly by optimize over both variational and model parameters:

# joint inference and learning
infer_and_learn_op = optimizer.minimize(
    cost, var_list=model_and_variational_parameters)

Parameters:

meta_bn – A MetaBayesianNet instance or a log joint probability function. For the latter, it must accepts a dictionary argument of (string, Tensor) pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.
observed – A dictionary of (string, Tensor) pairs. Mapping from names of observed stochastic nodes to their values.
latent – A dictionary of (string, (Tensor, Tensor)) pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.
axis – The sample dimension(s) to reduce when computing the outer expectation in the objective. If None, no dimension is reduced.
variational – A BayesianNet instance that defines the variational family. variational and latent are mutually exclusive.

bn¶: The BayesianNet constructed by observing the meta_bn with samples from the variational posterior distributions. None if the log joint probability function is provided instead of meta_bn.

Note

This BayesianNet instance is useful when computing predictions with the approximate posterior distribution.

meta_bn¶: The inferred model. A MetaBayesianNet instance. None if instead log joint probability function is given.

reinforce(variance_reduction=True, baseline=None, decay=0.8)¶

Implements the score function gradient estimator for the ELBO, with optional variance reduction using moving mean estimate or “baseline”. Also known as “REINFORCE” (Williams, 1992), “NVIL” (Mnih, 2014), and “likelihood-ratio estimator” (Glynn, 1990).

It works for all types of latent StochasticTensor s.

Note

To use the reinforce() estimator, the is_reparameterized property of each reparameterizable latent StochasticTensor must be set False.

Parameters:

variance_reduction – Bool. Whether to use variance reduction. By default will subtract the learning signal with a moving mean estimation of it. Users can pass an additional customized baseline using the baseline argument, in that way the returned will be a tuple of costs, the former for the gradient estimator, the latter for adapting the baseline.
baseline – A Tensor that can broadcast to match the shape returned by log_joint. A trainable estimation for the scale of the elbo value, which is typically dependent on observed values, e.g., a neural network with observed values as inputs. This will be additional.
decay – Float. The moving average decay for variance normalization.

Returns:

A Tensor. The surrogate cost for Tensorflow optimizers to minimize.

sgvb()¶

Implements the stochastic gradient variational bayes (SGVB) gradient estimator for the ELBO, also known as “reparameterization trick” or “path derivative estimator”.

It only works for latent StochasticTensor s that can be reparameterized (Kingma, 2013). For example, Normal and Concrete.

Note

To use the sgvb() estimator, the is_reparameterized property of each latent StochasticTensor must be True (which is the default setting when they are constructed).

Returns:	A Tensor. The surrogate cost for Tensorflow optimizers to minimize.

tensor¶: Return the Tensor representing the value of the variational objective.

variational¶: The variational family. A BayesianNet instance. None if instead latent is given.

Inclusive KL divergence¶

klpq(meta_bn, observed, latent=None, axis=None, variational=None)¶

The inclusive KL objective for variational inference. The returned value is an InclusiveKLObjective instance.

See InclusiveKLObjective for examples of usage.

Parameters:

meta_bn – A MetaBayesianNet instance or a log joint probability function. For the latter, it must accepts a dictionary argument of (string, Tensor) pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.
observed – A dictionary of (string, Tensor) pairs. Mapping from names of observed stochastic nodes to their values.
latent – A dictionary of (string, (Tensor, Tensor)) pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.
axis – The sample dimension(s) to reduce when computing the outer expectation in the objective. If None, no dimension is reduced.
variational – A BayesianNet instance that defines the variational family. variational and latent are mutually exclusive.

Returns:

An InclusiveKLObjective instance.

class InclusiveKLObjective(meta_bn, observed, latent=None, axis=None, variational=None)¶

Bases: zhusuan.variational.base.VariationalObjective

The class that represents the inclusive KL objective (\(KL(p\|q)\), i.e., the KL-divergence between the true posterior \(p\) and the variational posterior \(q\)). This is the opposite direction of the one (\(KL(q\|p)\), or exclusive KL objective) that induces the ELBO objective.

An instance of the class can be constructed by calling klpq():

# klpq_obj is an InclusiveKLObjective instance
klpq_obj = zs.variational.klpq(
    meta_bn, observed, variational=variational, axis=axis)

Here meta_bn is a MetaBayesianNet instance representing the model to be inferred. variational is a BayesianNet instance that defines the variational family. axis is the index of the sample dimension used to estimate the expectation when computing the gradients.

Unlike most VariationalObjective instances, the instance of InclusiveKLObjective cannot be used like a Tensor or evaluated, because in general this objective is not computable.

The only thing one could achieve with this objective is purely for inference, i.e., optimize it wrt. variational parameters (parameters in \(q\)). The way to perform this is by calling the supported gradient estimator and getting the surrogate cost to minimize. Currently there is

importance(): The self-normalized importance sampling gradient estimator.

So the typical code for doing variational inference is like:

# call the gradient estimator to return the surrogate cost
cost = klpq_obj.importance()

# optimize the surrogate cost wrt. variational parameters
optimizer = tf.train.AdamOptimizer(learning_rate)
infer_op = optimizer.minimize(cost, var_list=variational_parameters)
with tf.Session() as sess:
    for _ in range(n_iters):
        _, lb = sess.run([infer_op, lower_bound], feed_dict=...)

Note

The inclusive KL objective is only a criteria for variational inference but not model learning (Optimizing it doesn’t do maximum likelihood learning like the ELBO objective does). That means, there is no reason to optimize the surrogate cost wrt. model parameters.

Parameters:

meta_bn – A MetaBayesianNet instance or a log joint probability function. For the latter, it must accepts a dictionary argument of (string, Tensor) pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.
observed – A dictionary of (string, Tensor) pairs. Mapping from names of observed stochastic nodes to their values.
latent – A dictionary of (string, (Tensor, Tensor)) pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.
axis – The sample dimension(s) to reduce when computing the outer expectation in the objective. If None, no dimension is reduced.
variational – A BayesianNet instance that defines the variational family. variational and latent are mutually exclusive.

bn¶: The BayesianNet constructed by observing the meta_bn with samples from the variational posterior distributions. None if the log joint probability function is provided instead of meta_bn.

Note

This BayesianNet instance is useful when computing predictions with the approximate posterior distribution.

importance()¶

Implements the self-normalized importance sampling gradient estimator for variational inference. This was used in the Reweighted Wake-Sleep (RWS) algorithm (Bornschein, 2015) to adapt the proposal, or variational posterior in the importance weighted objective (See ImportanceWeightedObjective). Now this estimator is widely used for neural adaptive proposals in importance sampling.

It works for all types of latent StochasticTensor s.

Note

To use the rws() estimator, the is_reparameterized property of each reparameterizable latent StochasticTensor must be set False.

Returns:	A Tensor. The surrogate cost for Tensorflow optimizers to minimize.

meta_bn¶: The inferred model. A MetaBayesianNet instance. None if instead log joint probability function is given.

rws()¶: (Deprecated) Alias of importance().

tensor¶: Return the Tensor representing the value of the variational objective.

variational¶: The variational family. A BayesianNet instance. None if instead latent is given.

Monte Carlo objectives¶

importance_weighted_objective(meta_bn, observed, latent=None, axis=None, variational=None)¶

The importance weighted objective for variational inference (Burda, 2015). The returned value is an ImportanceWeightedObjective instance.

See ImportanceWeightedObjective for examples of usage.

Parameters:

meta_bn – A MetaBayesianNet instance or a log joint probability function. For the latter, it must accepts a dictionary argument of (string, Tensor) pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.
observed – A dictionary of (string, Tensor) pairs. Mapping from names of observed stochastic nodes to their values.
latent – A dictionary of (string, (Tensor, Tensor)) pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.
axis – The sample dimension(s) to reduce when computing the outer expectation in the objective. If None, no dimension is reduced.
variational – A BayesianNet instance that defines the variational family. variational and latent are mutually exclusive.

Returns:

An ImportanceWeightedObjective instance.

iw_objective(meta_bn, observed, latent=None, axis=None, variational=None)¶

The importance weighted objective for variational inference (Burda, 2015). The returned value is an ImportanceWeightedObjective instance.

See ImportanceWeightedObjective for examples of usage.

Parameters:

meta_bn – A MetaBayesianNet instance or a log joint probability function. For the latter, it must accepts a dictionary argument of (string, Tensor) pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.
observed – A dictionary of (string, Tensor) pairs. Mapping from names of observed stochastic nodes to their values.
latent – A dictionary of (string, (Tensor, Tensor)) pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.
axis – The sample dimension(s) to reduce when computing the outer expectation in the objective. If None, no dimension is reduced.
variational – A BayesianNet instance that defines the variational family. variational and latent are mutually exclusive.

Returns:

An ImportanceWeightedObjective instance.

class ImportanceWeightedObjective(meta_bn, observed, latent=None, axis=None, variational=None)¶

Bases: zhusuan.variational.base.VariationalObjective

The class that represents the importance weighted objective for variational inference (Burda, 2015). An instance of the class can be constructed by calling importance_weighted_objective():

# lower_bound is an ImportanceWeightedObjective instance
lower_bound = zs.variational.importance_weighted_objective(
    meta_bn, observed, variational=variational, axis=axis)

Here meta_bn is a MetaBayesianNet instance representing the model to be inferred. variational is a BayesianNet instance that defines the variational family. axis is the index of the sample dimension used to estimate the expectation when computing the objective.

Instances of ImportanceWeightedObjective are Tensor-like. They can be automatically or manually cast into Tensors when fed into Tensorflow operations and doing computation with Tensors, or when the tensor property is accessed. It can also be evaluated like a Tensor:

# evaluate the objective
with tf.Session() as sess:
    print sess.run(lower_bound, feed_dict=...)

The objective computes the same importance-sampling based estimate of the marginal log likelihood of observed variables as is_loglikelihood(). The difference is that the estimate now serves as a variational objective, since it is also a lower bound of the marginal log likelihood (as long as the number of samples is finite). The variational posterior here is in fact the proposal. As a variational objective, ImportanceWeightedObjective provides two gradient estimators for the variational (proposal) parameters:

sgvb(): The Stochastic Gradient Variational Bayes (SGVB) estimator, also known as “the reparameterization trick”, or “path derivative estimator”.
vimco(): The multi-sample score function estimator with variance reduction, also known as “VIMCO”.

The typical code for joint inference and learning is like:

# choose a gradient estimator to return the surrogate cost
cost = lower_bound.sgvb()
# or
# cost = lower_bound.vimco()

# optimize the surrogate cost wrt. model and variational
# parameters
optimizer = tf.train.AdamOptimizer(learning_rate)
infer_and_learn_op = optimizer.minimize(
    cost, var_list=model_and_variational_parameters)
with tf.Session() as sess:
    for _ in range(n_iters):
        _, lb = sess.run([infer_op, lower_bound], feed_dict=...)

Note

Don’t directly optimize the ImportanceWeightedObjective instance wrt. to variational parameters, i.e., parameters in \(q\). Instead a proper gradient estimator should be chosen to produce the correct surrogate cost to minimize, as shown in the above code snippet.

Because the outer expectation in the objective is not related to model parameters, it’s fine to directly optimize the class instance wrt. model parameters:

# optimize wrt. model parameters
learn_op = optimizer.minimize(-lower_bound,
                              var_list=model_parameters)
# or
# learn_op = optimizer.minimize(cost, var_list=model_parameters)
# both ways are correct

The above provides a way for users to combine the importance weighted objective with different methods of adapting proposals (\(q\)). In this situation the true posterior is a good choice, which indicates that any variational objectives can be used for the adaptation. Specially, when the klpq() objective is chosen, this reproduces the Reweighted Wake-Sleep algorithm (Bornschein, 2015) for learning deep generative models.

Parameters:

meta_bn – A MetaBayesianNet instance or a log joint probability function. For the latter, it must accepts a dictionary argument of (string, Tensor) pairs, which are mappings from all node names in the model to their observed values. The function should return a Tensor, representing the log joint likelihood of the model.
observed – A dictionary of (string, Tensor) pairs. Mapping from names of observed stochastic nodes to their values.
latent – A dictionary of (string, (Tensor, Tensor)) pairs. Mapping from names of latent stochastic nodes to their samples and log probabilities. latent and variational are mutually exclusive.
axis – The sample dimension(s) to reduce when computing the outer expectation in the objective. If None, no dimension is reduced.
variational – A BayesianNet instance that defines the variational family. variational and latent are mutually exclusive.

bn¶: The BayesianNet constructed by observing the meta_bn with samples from the variational posterior distributions. None if the log joint probability function is provided instead of meta_bn.

Note

This BayesianNet instance is useful when computing predictions with the approximate posterior distribution.

meta_bn¶: The inferred model. A MetaBayesianNet instance. None if instead log joint probability function is given.

sgvb()¶

Implements the stochastic gradient variational bayes (SGVB) gradient estimator for the objective, also known as “reparameterization trick” or “path derivative estimator”. It was first used for importance weighted objectives in (Burda, 2015), where it’s named “IWAE”.

It only works for latent StochasticTensor s that can be reparameterized (Kingma, 2013). For example, Normal and Concrete.

Note

To use the sgvb() estimator, the is_reparameterized property of each latent StochasticTensor must be True (which is the default setting when they are constructed).

Returns:	A Tensor. The surrogate cost for Tensorflow optimizers to minimize.

tensor¶: Return the Tensor representing the value of the variational objective.

variational¶: The variational family. A BayesianNet instance. None if instead latent is given.

vimco()¶

Implements the multi-sample score function gradient estimator for the objective, also known as “VIMCO”, which is named by authors of the original paper (Minh, 2016).

It works for all kinds of latent StochasticTensor s.

Note

To use the vimco() estimator, the is_reparameterized property of each reparameterizable latent StochasticTensor must be set False.

Returns:	A Tensor. The surrogate cost for Tensorflow optimizers to minimize.