Importance sampling is related to rejection sampling, which I looked at in the last post. Here is a short demo.
A problem of rejection sampling is that many samples could be evaluated in regions of low probability mass. This then lead to a high rate of attrition, with many samples being rejected. In importance sampling, this seems like less of an issue in terms of ending up with a large number of samples for an accurate representation of the distribution. Although the same basic problem is there in that the probability is being evaluated for many points in parameter space with very low or zero probability.
So let’s do the same thing from the last post and use this to do parameter estimation.
Which results in this
Here is a nice little figure I found that helped with the intuition.