I’ve been using MCMC, but I’ve wanted to flesh out my knowledge and explore the space of sampling approaches a little more. One very simple, yet inefficient method, is rejection sampling. Here is a little Matlab example I put together after seeing how easy it was.


%% true probability distribution
true_func = @(x) betapdf(x,1+1,1+10);
%% Do rejection sampling
% create many samples on interval 0-1
x_samples = rand(10^6,1);
% evaluate for each sample
sample_value = true_func(x_samples);
% accept in proportion to highest
max_value = max(sample_value);
accepted = rand(10^6,1) < (sample_value/max_value);
samples = x_samples(accepted);
%% plot
subplot(1,2,1)
x = linspace(0,1,1000);
plot(x, true_func(x) )
subplot(1,2,2)
hist(samples,1000)
proportion_rejected = sum(accepted==0) / numel(x_samples);
title([num2str(proportion_rejected*100) '% of samples rejected'])
% example inspired by http://www.r-bloggers.com/a-simple-explanation-of-rejection-sampling-in-r/

Which results in this output.

rejection_sampling

So we have a good set of samples which well approximates the true distribution. A few notes:

  • Generation of the samples (before rejection) must come from an appropriately broad prior which encompasses the relevant regions.
  • If it is costly to evaluate the function for each sample, then this is inefficient because many of those samples will be subsequently thrown away.
  • However, it is very very simple to code.

This is a nice figure to convey the intuition.

reject

Worked example: parameter estimation

Now let’s use this approach in a toy example to estimate the mean and sigma of a Gaussian distribution, based upon 20 draws.


true_mean = 0;
true_sigma = 1;
% likelihood_func = @(x, mean, sigma) normpdf(x, mean, sigma);
% the above function to calcalate in matrix form, for speed
likelihood_func = @(x, mean, sigma)…
prod(normpdf(repmat(x,[1 numel(mean)]),…
repmat(mean, [1 numel(x)])',…
repmat(sigma,[1 numel(x)])' ), 1);
%% generate data
N=20;
observed_data = normrnd(true_mean, true_sigma, [N 1]);
%% Do rejection sampling
% create many samples for mean and sigma
n_samples = 10^6;
mean_samples = (rand(n_samples,1)-0.5)*5;
sigma_samples = rand(n_samples, 1) * 10;
% evaluate likelihood for each (mean, sigma) sample
sample_value = likelihood_func(observed_data, mean_samples, sigma_samples);
% accept in proportion to highest
max_value = max(sample_value);
accepted = rand(1,n_samples) < (sample_value./max_value);
mean_samples_accepted = mean_samples(accepted);
sigma_samples_accepted = sigma_samples(accepted);
%% plot
subplot(1,2,1)
plot(mean_samples,sigma_samples,'.')
xlabel('mean')
ylabel('sigma')
title('initial samples')
subplot(1,2,2)
plot(mean_samples_accepted,sigma_samples_accepted,'.')
hold on
plot(true_mean, true_sigma, 'r.','MarkerSize',5^2)
xlabel('mean')
ylabel('sigma')
title(['posterior samples, ' num2str(sum(accepted==0)/n_samples*100)…
'% rejection rate'])

The code above results in a reasonable set of samples from the posterior. However, note the exceptionally high rejection rate. This is because the proposal distribution is broad (note the axis scales) because in real situations we may have very little knowledge of where the posterior density is focussed.

rejection sampling param estimation

Thoughts

Rejection sampling is easy to implement, but it is very inefficient. However, I can imagine some interesting schemes were you start off with many samples over a broad region of parameter space to get an initial indication of the region of parameter space of interest. You could then repeat rejection sampling with with the proposal distribution focussed upon this interesting region of parameter space.

Remaining questions

  • I am not sure if the proposal distribution has to be uniform.
  • If not, does the proposal distribution represent our prior beliefs?

TODO

update example based on insights in this great video

 

Join the conversation

3 Comments

  1. Hi

    This is really interesting (and the video really helps explain it).

    Do you think it’s sufficient to assume that you’ll have found the maximum pdf value through sampling to use as the acceptance criteria? Obviously in your cases it works, since the distribution isn’t too broad, and you’ve taken a large number of samples. If the distribution were black-box or hard to compute, and you only wanted several samples at a time, how would you find a max value to use to scale your acceptance criterion?

    (I commented mostly to ask how I could be notified of future posts, since this so interesting, but I see that I can select that here – it may be worth creating a follow button which is easier to find, unless I completely missed it while searching)

Leave a comment

Leave a Reply