Probabilistic programming: a tentative first encounter

While I have some experience with probabilistic programming in the flavour of Bayesian Networks, and have published papers using them, I am interested in the super-class of generic probabilistic programs. That is, right now I am happy with conducting inference on Bayesian Networks, but I want to learn how to conduct inference on generic programs. As part of this side-project I am working through some examples. The one discussed below is borrowed from a talk given by Noah Goodman, which I recommend you watch.

The functional programming paradigm is popular in this scene, but for the moment I am trying to focus on the probabilistic programming concepts, so I am avoiding the head f**k move from imperative to functional programming for the moment. I worked through one of the examples in the talk, turning it into imperative (Matlab) code. This was a very useful exercise, and I’d say I now have a basic understanding of the conceptual approach, and a slightly better understanding of functional programming.

EDIT: I took an hour out of my life to convert this into Julia code. It was quite easy, and I liked doing it in Jupyter. Click for the Julia code version here.

A generative model

The example is a Bayesian Network which describes the joint distribution P(flu, TB, cough). First, here is the generative model, which I drew on the back of an envelope.
File 12-02-2016, 13 59 59

Let’s write down our generative model in Matlab code.

1
2
3
4
5
6
7
8
9
10
function [state] = model()
flipCoin = @(p) lt(rand,p);
state.flu = flipCoin(0.2); 
state.TB = flipCoin(0.1);
if or(state.flu,state.TB)
    state.cough = flipCoin(0.8);
else
    state.cough = flipCoin(0.1);
end
return

When we call the model() function the generative model will run once and the state will be returned in the state structure. We can run the model many times and look at the distributions of states of the world according to our generative model.

Assessing conditional probabilities, or making ‘queries’

Another thing we might want to do is to evaluate conditional probability distributions. For example, we might want to know the probability of having the flu given we observe a cough, P(flu|cough). The way how we can do this is conceptually very easy to grasp, especially with the use of a rejection sampler.

What we do is to run the model many times. Each time the model will have a state. We are only interested in occasions where the particular states we specify as observed (i.e. cough=true) occur. So what we store is the state of the model on all occasions where the condition we specify is true. Here is the code for a rejection sampler:

1
2
3
4
5
6
7
8
9
function queryReturn = rejectionQuery(model, queryFunc, conditionFunc)
queryReturn = [];
while lt(numel(queryReturn),10^4)
    state = model();
    if conditionFunc(state)
        queryReturn(end+1) = queryFunc(state);
    end
end
return

We should remember that the rejection sampler will be inefficient for many situations. Once we step away from finite state spaces (i.e. discreet variables in our model) into continuous state spaces, then this approach will just not work. The proportion of the time where the state of the generative model will conform to the condition given will be near zero. This is not a problem of the approach here, but simply of the nature of the sampling algorithm. There are others.

Having said that, there was a bit of a lightbulb moment when I understood how this was working. I’d say it’s well worth understanding this. Even though this sampler might be impractical for many situations, it still gives you a backup position in terms of your conceptual understanding. A lot of the details further down the line (I think) come down to how to achieve this functionality in a computationally efficient manner.

P(flu|cough)

Let’s run this to calculate P(flu|cough). We can do this by calling the rejectionQuery() function, passing in the model, the query and the condition.

1
2
3
4
5
6
queryReturn = rejectionQuery(...
    @model,...
    @(x) x.flu,... % query expression
    @(x) x.cough); % condition
 
fprintf('%.2f %%\n', sum(queryReturn)./numel(queryReturn) *100)

The result is a binary vector of observations (of flu) given cough=true. On one run I did, this resulted in an estimate of P(flu|cough) = 53.92%.

P(flu|cough&TB)

But how do our beliefs in having the flu change if we observe a cough and TB. We can test this by changing the condition expression

1
2
3
4
5
6
queryReturn = rejectionQuery(...
    @model,...
    @(x) x.flu,... % query expression
    @(x) and(x.cough,x.TB) ); % condition
 
fprintf('%.2f %%\n', sum(queryReturn)./numel(queryReturn) *100)

Running this gave me P(flu|cough&TB) = 19.71%.

Sampling from the prior

We can do this by simply accepting all samples by setting the condition to always be valid (i.e. evaluating the joint distribution, conditional upon nothing).

1
2
3
4
queryReturn = rejectionQuery(...
	@model,...
	@(x) x.flu ,... % query expression
	@(x) true); % condition

What is good about this?

  1. The focus upon the generative process. The generative model diagram, typical in a Bayes Net approach, described the joint distribution of the binary variables. This is all fine, but what is nice about the generative model specification was that it just consists of the mechanism. There was a function called coinFlip(), but nowhere did I have to write down anything about a Binomial distribution. The focus was upon the generative process. I imagine this is going to be important when conducting inference on generic programs which simply aren’t composed of distributions.
  2. Generality. Of course, this is a Bayes Net and there are a wide range of tools and approaches to work with Bayesian Networks. So, this example does not touch upon the main advantage of the approach yet… the ability to compose arbitrary(?) models and conduct inferences using them. In theory, you should be able to just make a call in the form queryReturn = Query(model, query, condition) and get back our query regardless of the nature of the model.
  3. The code is pretty concise.

Compromises forced by Matlab

Not sure if it’s a compromise, but in the Matlab code we have some lines to change state. For example, in rejectionQuery() we initialise queryReturn as an empty list then append values. But this is more of a functional programming issue, not one of probabilistic programming per se.

In the examples by Noah Goodman, the rejection sampler was recursive, that is, if the conditions were true then we save the data, otherwise we call the model function again. Matlab has a maximise recursion depth of 500 and so it is not suitable for this approach. This was easily solved by using iteration rather than recursion. For the moment I don’t see this as being a big deal, but perhaps the appeal will become clear further down the line.

There are probably some other things here that are too subtle for me to detect right now, given that I am not really up to speed on functional programming. But overall implementing this example in Matlab was pretty quick and painless.

To do list

  • Currently limited to querying only one variable. Fix this.
  • Try to implement Metropolis-Hastings queries.
  • Examine a model with continuous variables.
  • Examine an arbitrary model that is not a Bayesian Network.

Although I strongly suspect that there is a reason why people use functional programming languages for probabilistic programming, and that implementing this in Matlab would be non-trivial if not impossible. Either way this should be a good way to learn.

Further reading

3 thoughts on “Probabilistic programming: a tentative first encounter

  1. Reply
    Junpeng Lao - 2016/08/12

    Nice post!
    This reminds me of something I came across on Github (https://github.com/adambard/functools-for-matlab). I am wondering how is your code performs speed wise?

    1. Reply
      Ben - 2016/08/13

      I’m glad you found it interesting. I’m trying to move over to a more functional style of coding, so I might learn through refactoring some existing code to a functional-style. If so, I’ll definitely use that repo 🙂

    2. Reply
      Ben - 2016/08/13

      I didn’t test this code for speed because it was just for my understanding, rather than for real usage. I think the single biggest speed increase would come from doing the `rejectionQuery` in parallel over multiple cores.

      Or to just use Julia 😉

Leave a Reply