SPN Frequently Asked Questions

Can you convert between SPNs and graphical models?

Yes, any graphical model can be represented by an SPN. The naive approach is to create a root sum node with a child product node over the factors for each possible joint state of the variables. A better way is to construct a junction tree from the graphical model and then translate this to an SPN.

The benefit of SPNs is not that they somehow make existing GMs tractable; if a graphical model requires approximate inference, then it will translate to a very large SPN.

What's special about SPNs is that they can perform fast inference on high-treewidth dependencies. These high-treewidth SPNs cannot be represented by GMs. The following diagram may be helpful:

Compactly-representable probability distributions:

How do you create an SPN from a junction tree?

If R is the root of the junction tree, run BuildSPN(R, {})

BuildSPN(Cluster C, Evidence e):
    If key (C, e) exists in M, return M(C, e)
    Create sum node S
    For each joint state x of variables X = (Va(C) \ e):
        Create product node P
        For each variable setting v in x:
            Create univariate distribution with mass on v as child of P
        Set w to be the product of all factors in C with state (xe)
        For each child C' of C:
	    Add BuildSPN(C', (xe) ∩ Sepset(C, C')) as child of P
        Add P as child of S with edge-weight w
    Store (C, e) → S in map M
    Return S

Va(C) denotes the variables in cluster C
M is a map that caches solutions.
    Key is cluster-evidence pair, and value is an SPN node.
Sepset(C,C') is the set of variables in the sepset between clusters C and C'.
Bold denotes a vector.
If you convert a graphical model to an SPN, is it just a single product layer that computes the product of all the factors?

No. There is a big difference between the product-of-factors description of a GM and the graph of computations in an SPN. An SPN represents all possible inference queries (computable in linear time) whereas the factorization of a GM only tells us the (unnormalized) probability of a complete state. To perform inference on a graphical model, we put an outer summation around the product-of-factors. With most graphical models, we can "push" the summation of certain variables to cover a smaller set of factors. This means that a corresponding SPN will not be "flat" (root sum over a product of factors for each variable state) and will have multiple layers that leverage the distributive law. While there are some intractable GMs that would translate to flat or otherwise unwieldy SPNs, this is just a reflection of the computations required to perform inference on a model described by conditional independencies.

Are SPNs just trees?

No; in general they are directed acyclic graphs. See the experiments in Poon & Domingos 2011 for an example architecture.

What are the drawbacks of tree-structured SPNs? Are they similar to tree-structured graphical models?

A tree-structured SPN has the same representational power as an SPN that is a DAG. The advantage of a DAG SPN is that it reuses computation, which requires fewer nodes and results in faster inference. A tree-structured SPN can represent high-treewidth dependencies that would result in a complete clique if converted to a GM. A tree-structured GM is limited to representing low-treewidth dependencies.

Can SPNs represent convolution?

Yes, you can use weight-tying for nodes at different locations. See the experimental architecture in Gens & Domingos 2012.

What are the drawbacks of SPNs?

It takes more effort to hand-design local independencies than global independencies. This would appear to give graphical models (GMs) an advantage, but the simplicity of a graph of variables belies the true inference and learning cost. This has motivated several papers on learning the structure of SPNs from data. Not only do these papers show that automatically-learned SPN structure performs better than hand-designed SPNs [Dennis & Ventura 2012 ] but also that the resulting SPNs are more accurate and orders of magnitude faster at inference when compared to learned Graphical Models [Gens & Domingos 2013].