Visual Guide to Statistics. Part II: Bayesian Statistics
Part II introduces different approach to parameters estimation called Bayesian statistics.
Basic definitions
We noted in the previous part that it is extremely unlikely to get a uniformly best estimator. An alternative way to compare risk functions is to look at averaged values (weighting over parameters probabilities) or at maximum values for worst-case scenarios.
In Bayes interpretation parameter
is called the Bayes risk of
The right hand side of the equation above is call the Bayes risk. The function
In the following we will denote conditional distribution of
and joint distribution of
Before experiment we have
Posterior risk
Recall that risk function is an expected value of a loss function
Then
The term
is called a posterior risk of
because
Say for
Posterior and Bayes risks respectively
and
Let’s take an example of an estimation of probability parameter for binomial distribution. Let
We take quadratic loss function
On the other hand, we have density
If we take prior uniform distribution
where we have beta-function in denominator:
Then Bayes estimator will be
and Bayes risk:
Let’s take another example:
Taking density for
we get posterior distribution
where
For quadratic loss function
Otherwise,
Fig. 1. Bayesian inference for normal distribution.
Minimax estimator
For an estimator
is called the maximum risk and
is minimax risk and corresponding
where
then
then for any
and therefore
Sometimes Bayes risk can be constant:
Then
Let’s get back to an example with binomial distribution:
Again we use quadratic loss, but only this time we take parameterized beta distrubution
Note that for
We use our prior knowledge that for random variable
Recall that for quadratic loss expected value of
is a Bayes estimator and it provides risk
If we choose
Such risk doesn’t depend on
Fig. 2. Bayesian inference for binomial distribution. Note that when least favorable prior is chosen, Bayes and minimax estimators coincide regardless of the sample value.
Least favorable sequence of priors
Let
Then sequence
, .
Let
Then
and therefore
hence
Let’s get back to our previous example of estimating mean for normal distribution with known
with
For any
Since the risk is bounded from above:
by Lebesgue Dominated Convergence Theorem 1 we have
Since for estimator
holds,
Suppose there is measurable space
with measure . Also let and be measurable functions on and almost everywhere. Then if there exists an integrable function defined on the same space such thatalmost everywhere, then
and are integrable and