I’ve been using MCMC, but I’ve wanted to flesh out my knowledge and explore the space of sampling approaches a little more. One very simple, yet inefficient method, is rejection sampling. Here is a little Matlab example I put together after seeing how easy it was.
https://gist.github.com/drbenvincent/486270cd169f7c420c6c
Which results in this output.
So we have a good set of samples which well approximates the true distribution. A few notes:
- Generation of the samples (before rejection) must come from an appropriately broad prior which encompasses the relevant regions.
- If it is costly to evaluate the function for each sample, then this is inefficient because many of those samples will be subsequently thrown away.
- However, it is very very simple to code.
This is a nice figure to convey the intuition.
Worked example: parameter estimation
Now let’s use this approach in a toy example to estimate the mean and sigma of a Gaussian distribution, based upon 20 draws.
https://gist.github.com/drbenvincent/28ff20b88d87846e4204
The code above results in a reasonable set of samples from the posterior. However, note the exceptionally high rejection rate. This is because the proposal distribution is broad (note the axis scales) because in real situations we may have very little knowledge of where the posterior density is focussed.
Thoughts
Rejection sampling is easy to implement, but it is very inefficient. However, I can imagine some interesting schemes were you start off with many samples over a broad region of parameter space to get an initial indication of the region of parameter space of interest. You could then repeat rejection sampling with with the proposal distribution focussed upon this interesting region of parameter space.
Remaining questions
- I am not sure if the proposal distribution has to be uniform.
- If not, does the proposal distribution represent our prior beliefs?
TODO
update example based on insights in this great video
Hi
This is really interesting (and the video really helps explain it).
Do you think it’s sufficient to assume that you’ll have found the maximum pdf value through sampling to use as the acceptance criteria? Obviously in your cases it works, since the distribution isn’t too broad, and you’ve taken a large number of samples. If the distribution were black-box or hard to compute, and you only wanted several samples at a time, how would you find a max value to use to scale your acceptance criterion?
(I commented mostly to ask how I could be notified of future posts, since this so interesting, but I see that I can select that here – it may be worth creating a follow button which is easier to find, unless I completely missed it while searching)