STAT 515 Government College University Statistics Worksheet
Homework 4 Written AssignmentStat 515
Due February 25 at 11:59pm
Averages minimize mean square error
Surveys are (in this class) designed to estimate means over populations. Why
do we care so much about these means? Should we report some other number
summarizing the population instead? For example, why not report the mode or
the median? The problems below explain why one might prefer one summary
Let x1 , . . . , xN ∈ R be measurements of some property over a population
of N individuals. For example, suppose that x1 , . . . , xN are the heights of the
students in this class. Let
be the population mean.
Now suppose that we are given an alternative summary m of the population
such as the median. How can we compare the quality of m and µ as summaries
of the population? For any m ∈ R, we define the mean square error
M SE(m) =
(xi − m)2 .
The mean square error is a measure of the deviation between m and the observations x1 , . . . , xN . When the mean square error is low, one can say that the
summary faithfully represents the population. In the problem below, you will
show that the population mean is the summary that minimizes the mean square
(a) (4 points) Show that M SE(m) = (m − µ)2 + N1 i=1 (xi − µ)2 . Hint:
M SE(m) is a quadratic function of m. Complete the square.
(b) (2 points) Draw the graph of M SE as a function of m. Label µ on the
m-axis of the graph. Label N1 i=1 (xi − µ)2 on the M SE-axis. What is
the minimizer of M SE?
Expectations of random variables have properties similar to population means.
(This should not surprise you, given that the population mean is the expectation of a single observation drawn uniformly at random from the population.)
In particular, the expectation of a random variable minimizes a generalization
of the mean square error.
Problem 2. Let X be a random variable and define the expected square error
ESE(m) = E[(X − m)2 ].
(a) (2 points) Show that ESE(m) = (m − E[X])2 + var(X). Hint: Complete
the square as above. Use linearity of expectation.
(b) (1 point) What is the minimizer of ESE(m)?
One can show that the median of a population minimizes a different measure
of the error. Define the mean absolute error
M AE(m) =
|xi − m|.
The median of x1 , . . . , xN minimizes M AE(m). This is slightly more difficult
to prove. You can get the basic idea by considering the case where N = 3.
Problem 3. (6 points extra credit!) Suppose N = 3 and assume for convenience
that x1 < x2 < x3 . In that case, the median is x2 . Draw the graph of M AE and explain why x2 is the minimizer. Hint: First convince yourself that M AE(m) is a continuous piecewise linear function of m. What is the slope of M AE(m) for m < x1 ? How about for x1 < m < x2 ? You will see that the slopes of the linear segments of M AE(m) do not depend on the particular values of x1 , x2 , and x3 . What does this tell you about the graph of M AE(m) and the minimizer? Machine learning is all about minimizing various measures of error between a model (or summary) and a population. These homework problems might be the most important in the whole class. Ask your instructor to elaborate! I will note that you have not seen the last of this perspective. When we study correlation and conditional expectations we will generalize these results. 2