# STAT 515 Government College University Statistics Worksheet

Homework 4 Written AssignmentStat 515

Spring 2022

Due February 25 at 11:59pm

1

Averages minimize mean square error

Surveys are (in this class) designed to estimate means over populations. Why

do we care so much about these means? Should we report some other number

summarizing the population instead? For example, why not report the mode or

the median? The problems below explain why one might prefer one summary

over another.

Let x1 , . . . , xN ∈ R be measurements of some property over a population

of N individuals. For example, suppose that x1 , . . . , xN are the heights of the

students in this class. Let

N

1 X

µ=

xi

N i=1

be the population mean.

Now suppose that we are given an alternative summary m of the population

such as the median. How can we compare the quality of m and µ as summaries

of the population? For any m ∈ R, we define the mean square error

M SE(m) =

N

1 X

(xi − m)2 .

N i=1

The mean square error is a measure of the deviation between m and the observations x1 , . . . , xN . When the mean square error is low, one can say that the

summary faithfully represents the population. In the problem below, you will

show that the population mean is the summary that minimizes the mean square

error.

Problem 1.

PN

(a) (4 points) Show that M SE(m) = (m − µ)2 + N1 i=1 (xi − µ)2 . Hint:

M SE(m) is a quadratic function of m. Complete the square.

(b) (2 points) Draw the graph of M SE as a function of m. Label µ on the

PN

m-axis of the graph. Label N1 i=1 (xi − µ)2 on the M SE-axis. What is

the minimizer of M SE?

1

Expectations of random variables have properties similar to population means.

(This should not surprise you, given that the population mean is the expectation of a single observation drawn uniformly at random from the population.)

In particular, the expectation of a random variable minimizes a generalization

of the mean square error.

Problem 2. Let X be a random variable and define the expected square error

ESE(m) = E[(X − m)2 ].

(a) (2 points) Show that ESE(m) = (m − E[X])2 + var(X). Hint: Complete

the square as above. Use linearity of expectation.

(b) (1 point) What is the minimizer of ESE(m)?

One can show that the median of a population minimizes a different measure

of the error. Define the mean absolute error

M AE(m) =

N

1 X

|xi − m|.

N i=1

The median of x1 , . . . , xN minimizes M AE(m). This is slightly more difficult

to prove. You can get the basic idea by considering the case where N = 3.

Problem 3. (6 points extra credit!) Suppose N = 3 and assume for convenience

that x1 < x2 < x3 . In that case, the median is x2 . Draw the graph of M AE and
explain why x2 is the minimizer. Hint: First convince yourself that M AE(m) is
a continuous piecewise linear function of m. What is the slope of M AE(m) for
m < x1 ? How about for x1 < m < x2 ? You will see that the slopes of the linear
segments of M AE(m) do not depend on the particular values of x1 , x2 , and x3 .
What does this tell you about the graph of M AE(m) and the minimizer?
Machine learning is all about minimizing various measures of error between
a model (or summary) and a population. These homework problems might be
the most important in the whole class. Ask your instructor to elaborate!
I will note that you have not seen the last of this perspective. When we
study correlation and conditional expectations we will generalize these results.
2