# Basic Concepts in ZhuSuan¶

## Distribution¶

Distributions are basic functionalities for building probabilistic models.
The `Distribution`

class is the base class
for various probabilistic distributions which support batch inputs, generating
batches of samples and evaluate probabilities at batches of given values.

The list of all available distributions can be found on these pages:

We can create a univariate Normal distribution in ZhuSuan by:

```
>>> import zhusuan as zs
>>> a = zs.distributions.Normal(mean=0., logstd=0.)
```

The typical input shape for a `Distribution`

is like `batch_shape + input_shape`

, where `input_shape`

represents the
shape of a non-batch input parameter;
`batch_shape`

represents how many independent inputs are
fed into the distribution.
In general, distributions support broadcasting for inputs.

Samples can be generated by calling
`sample()`

method of distribution
objects.
The shape is `([n_samples] + )batch_shape + value_shape`

.
The first additional axis is omitted only when passed n_samples is None
(by default), in which case one sample is generated. `value_shape`

is the
non-batch value shape of the distribution.
For a univariate distribution, its `value_shape`

is [].

An example of univariate distributions
(`Normal`

):

```
>>> import tensorflow as tf
>>> _ = tf.InteractiveSession()
>>> b = zs.distributions.Normal([[-1., 1.], [0., -2.]], [0., 1.])
>>> b.batch_shape.eval()
array([2, 2], dtype=int32)
>>> b.value_shape.eval()
array([], dtype=int32)
>>> tf.shape(b.sample()).eval()
array([2, 2], dtype=int32)
>>> tf.shape(b.sample(1)).eval()
array([1, 2, 2], dtype=int32)
>>> tf.shape(b.sample(10)).eval()
array([10, 2, 2], dtype=int32)
```

and an example of multivariate distributions
(`OnehotCategorical`

):

```
>>> c = zs.distributions.OnehotCategorical([[0., 1., -1.],
... [2., 3., 4.]])
>>> c.batch_shape.eval()
array([2], dtype=int32)
>>> c.value_shape.eval()
array([3], dtype=int32)
>>> tf.shape(c.sample()).eval()
array([2, 3], dtype=int32)
>>> tf.shape(c.sample(1)).eval()
array([1, 2, 3], dtype=int32)
>>> tf.shape(c.sample(10)).eval()
array([10, 2, 3], dtype=int32)
```

There are cases where a batch of random variables are grouped into a
single event so that their probabilities can be computed together.
This is achieved by setting group_ndims argument, which defaults to 0.
The last group_ndims number of axes in
`batch_shape`

are grouped into a single event.
For example, `Normal(..., group_ndims=1)`

will
set the last axis of its `batch_shape`

to a single event,
i.e., a multivariate Normal with identity covariance matrix.

The log probability density (mass) function can be evaluated by passing given
values to `log_prob()`

method of
distribution objects.
In that case, the given Tensor should be
broadcastable to shape `(... + )batch_shape + value_shape`

.
The returned Tensor has shape `(... + )batch_shape[:-group_ndims]`

.
For example:

```
>>> d = zs.distributions.Normal([[-1., 1.], [0., -2.]], 0.,
... group_ndims=1)
>>> d.log_prob(0.).eval()
array([-2.83787704, -3.83787727], dtype=float32)
>>> e = zs.distributions.Normal(tf.zeros([2, 1, 3]), 0.,
... group_ndims=2)
>>> tf.shape(e.log_prob(tf.zeros([5, 1, 1, 3]))).eval()
array([5, 2], dtype=int32)
```

## BayesianNet¶

In ZhuSuan we support building probabilistic models as Bayesian networks, i.e., directed graphical models. Below we use a simple Bayesian linear regression example to illustrate this. The generative process of the model is

where \(x\) denotes the input feature in the linear regression. We apply a Bayesian treatment and assume a Normal prior distribution of the regression weights \(w\). Suppose the input feature has 5 dimensions. For simplicity we define the input as a placeholder and fix the hyper-parameters:

```
x = tf.placeholder(tf.float32, shape=[5])
alpha = 1.
beta = 0.1
```

To define the model, the first step is to construct a
`BayesianNet`

instance:

```
bn = zs.BayesianNet()
```

A Bayesian network describes the dependency structure of the joint
distribution over a set of random variables as directed graphs.
To support this, a `BayesianNet`

instance can
keep two kinds of nodes:

Stochastic nodes. They are random variables in graphical models. The

`w`

node can be constructed as:w = bn.normal("w", tf.zeros([x.shape[-1]], std=alpha)

Here

`w`

is a`StochasticTensor`

that follows the`Normal`

distribution:>>> print(w) <zhusuan.framework.bn.StochasticTensor object at ...

For any distribution available in

`zhusuan.distributions`

, we can find a method of`BayesianNet`

for creating the corresponding stochastic node. The returned`StochasticTensor`

instances are Tensor-like, which means that you can mix them with almost any Tensorflow primitives, for example, the predicted mean of the linear regression is an inner product between`w`

and the input`x`

:y_mean = tf.reduce_sum(w * x, axis=-1)

Deterministic nodes. As the above code shows, deterministic nodes can be constructed directly with Tensorflow operations, and in this way

`BayesianNet`

does not keep track of them. However, in some cases it’s convenient to enable the tracking by the`deterministic()`

method:y_mean = bn.deterministic("y_mean", tf.reduce_sum(w * x, axis=-1))

This allows you to fetch the

`y_mean`

Tensor from`bn`

whenever you want it.

The full code of building a Bayesian linear regression model is like:

```
def bayesian_linear_regression(x, alpha, beta):
bn = zs.BayesianNet()
w = bn.normal("w", mean=0., std=alpha)
y_mean = tf.reduce_sum(w * x, axis=-1)
bn.normal("y", y_mean, std=beta)
return bn
```

A unique feature of graphical models is that stochastic nodes are allowed to
have undetermined behaviour (i.e., being latent), and we can observe them at
any time (then they are fixed to the observations).
In ZhuSuan, the `BayesianNet`

can be initialized
with a dictionary argument observed to assign observations to certain
stochastic nodes, for example:

```
bn = zs.BayesianNet(observed={"w": w_obs})
```

will cause the random variable \(w\) to be observed as `w_obs`

.
The result is that in `bn`

, `y_mean`

is computed from the observed value
of `w`

(`w_obs`

).
For stochastic nodes that are not given observations, their samples will be
used when the corresponding `StochasticTensor`

is
involved in computation with Tensors or fed into Tensorflow operations.
In this example it means that if we don’t pass any observation to `bn`

, the
samples of `w`

will be used to compute `y_mean`

.

Although the above approach allows assigning observations to stochastic
nodes, in most common cases, it is more convenient to first define the
graphical model, and then pass observations whenever needed.
Besides, the model should allow queries with different configurations of
observations.
To enable this workflow, we introduce a new class
`MetaBayesianNet`

.
Conceptually we can view
`MetaBayesianNet`

instances as the original
model and `BayesianNet`

as the result of certain
observations.
As we shall see, `BayesianNet`

instances can be
lazily constructed from its meta class instance.

We made it very easy to define the model as a
`MetaBayesianNet`

.
There is no change to the above code but just adding a decorator to the
function:

```
@zs.meta_bayesian_net(scope="model")
def bayesian_linear_regression(x, alpha, beta):
bn = zs.BayesianNet()
w = bn.normal("w", mean=0., std=alpha)
y_mean = tf.reduce_sum(w * x, axis=-1)
bn.normal("y", y_mean, std=beta)
return bn
```

The function decorated by `zs.meta_bayesian_net()`

will return a
`MetaBayesianNet`

instead of the original
`BayesianNet`

instance:

```
>>> model = bayesian_linear_regression(x, alpha, beta)
>>> print(model)
<zhusuan.framework.meta_bn.MetaBayesianNet object at ...
```

As we have mentioned, `MetaBayesianNet`

can
allow different configurations of observations.
This is achieved by its
`observe()`

method.
We could pass observations as named arguments, and it will return a
corresponding `BayesianNet`

instance,
for example:

```
bn = model.observe(w=w_obs)
```

will set `w`

to be observed in the returned
`BayesianNet`

instance `bn`

.
Calling the above function with different named arguments instantiates the
`BayesianNet`

with different observations,
which resembles the common behaviour of probabilistic graphical models.

Note

The observation passed must have the same type and shape as the
`StochasticTensor`

.

If there are
tensorflow Variables
created in a model construction function, you may want to reuse them for
`BayesianNet`

instances with different
observations.
There is another decorator in ZhuSuan named `reuse_variables()`

to make
this convenient.
You could add it to any function that creates Tensorflow variables:

```
@zs.reuse_variables(scope="model")
def build_model(...):
bn = zs.BayesianNet()
...
return bn
```

or equivalently, switch on the reuse_variables option in the
`zs.meta_bayesian_net()`

decorator:

```
@zs.meta_bayesian_net(scope="model", reuse_variables=True)
def build_model(...):
bn = zs.BayesianNet()
...
return bn
```

Up to now we know how to construct a model and reuse it for different
observations.
After construction, `BayesianNet`

supports queries
about the current state of the network, such as:

```
# get named node(s)
w = bn["w"]
w, y = bn.get(["w", "y"])
# get log probabilities of stochastic nodes conditioned on the current
# value of other StochasticTensors.
log_pw, log_py = bn.cond_log_prob(["w", "y"])
# get log joint probability given the current values of all stochastic
# nodes
log_joint_value = bn.log_joint()
```

By default the log joint probability is computed by summing over
conditional log probabilities at all stochastic nodes.
This requires that the distribution batch shapes of all stochastic nodes
are correctly aligned.
If not, the returned value can be arbitrary.
Most of the time you can adjust the group_ndims parameter of the stochastic
nodes to fix this.
If that’s not the case, we still allow customizing the log joint
probability function by rewriting it in the
`MetaBayesianNet`

instance like:

```
meta_bn = build_linear_regression(x, alpha, beta)
def customized_log_joint(bn):
return tf.reduce_sum(
bn.cond_log_prob("w"), axis=-1) + bn.cond_log_prob("y")
meta_bn.log_joint = customized_log_joint
```

then all `BayesianNet`

instances constructed
from this `meta_bn`

will use the provided customized function to compute
the result of `bn.log_joint()`

.