\DeclareMathOperator * arg min a r g min \DeclareMathOperator * arg max a r g ma x arg min arg max commands are defined here

Idea of MLE: “likelihood” is the probability of the dataset appears, so the parameter that maximizes the likelihood might be the true parameter.

The Definition of MLE

Step 1, make assumptions:

Now estimate the parameters by maximizing the likelihood.

Likelihood: the product of the possibility / possibility density of the data / dataset.

Given a sample dataset $D$ , the likelihood is a function of $θ$ :

L (θ) = x \in D \prod p (x; θ)

Since log is monotone, sometimes we use log-likelihood:

LL (θ) = lo g L (θ) = x \in D \sum lo g p (x; θ)

Find the $θ$ that maximize the possibility of $D$ occurs:

\hat{θ}_{M L} = θ arg max L (θ)

The property of MLE

the optimized $\hat{θ}_{n}$ is converge in probability to the true parameter $θ_{0}$

n \to + \infty lim P (∣ \hat{θ}_{M L} - θ_{0} ∣ \geq ϵ) = 0

估计误差的分布会趋近于正态分布

The MLE of $η = g (θ)$ is $\overset{η}{^} = g (\hat{θ})$

η = g (θ) M (η) = θ : g (θ) = η sup L (θ) \overset{η}{^}_{M L} = θ arg min L (g (θ))

Define the likelihood of $η$ as the supremum of $L (θ)$ with $η = g (θ)$ :

M (η) = {θ : g (θ) = η} sup L (θ)

And the optimized $η$ should be the one that makes $M (\overset{η}{^})$ be supremum of $M (η)$ , which is the supremum of $L (θ)$ :

M (\overset{η}{^}) = η sup M (η) = η sup ({θ : g (θ) = η} sup L (θ)) = θ sup L (θ) = L (\hat{θ})

and the likelihood of $η = g (\hat{θ})$ is:

M (g (\hat{θ})) = {θ : g (θ) = g (\hat{θ})} sup L (θ) = L (\hat{θ})

which means $M (g (\hat{θ})) = sup_{η} M (η)$ , so $\overset{η}{^}$ can be $g (\hat{θ})$