# SPU Scatterplot Two Quantitative Variables Question

1 assignment . Excel Instructions: ScatterplotAs you can see, our dataset contains the following variables:

gender: 0 = male, 1 = female.

height: in inches.

weight: in pounds.

First we will create a scatterplot to examine how weight is related to height, ignoring gender.

To do that in Excel:

•

•

•

•

•

•

•

•

•

Sort the data by gender:

Hold down the Control key (Command key on MacOS) and click the A key to select all

of the data in the worksheet.

Select the Home tab, then the Editing group Sort & Filter -> Custom Sort.

In the pop-up window, make sure that My list has headers box is checked and then

choose gender from the pull-down menu next to Sort by. Click OK.

Now select all of the data in columns B and C, select the Insert tab and in

the Charts group choose Scatter.

Choose the first scatterplot option (Scatter with only Markers).

Now we have a scatterplot, but the data is all on the right of the plot. To fix this:

Right-click on the x-axis, and choose Format Axis from the pop-up menu.

Make sure that Axis Options is selected on the left, and then next

to Minimum enter 50 in the textbox. Click the X button to close the menu.

Unit 3 Assignment 1: Scatterplot

In this exercise we will:

• Learn how to create a scatterplot.

• Use the scatterplot to examine the relationship between two quantitative variables.

• Learn how to create a labeled scatterplot.

• Use the labeled scatterplot to better understand the form of a relationship.

In this activity we explore the relationship between weight and height for 81 adults. We will use height as

the explanatory variable. Weight is the response variable.

We will then label the men and women by adding the categorical variable gender to the scatterplot. We

will see if separating the groups contributes to our understanding of the form of the relationship between

height and weight.

Question 1:

Describe the relationship between the height and weight of the subjects. To describe the relationship

write about the pattern (direction, form, and strength) and any deviations from the pattern (outliers).

So far we have studied the relationship between height and weight for all of the males and females

together. It may be interesting to examine whether the relationship between height and weight is different

for males and females. To visualize the effect of the third variable, gender, we will indicate in the

scatterplot which observations are males and which are females.

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

Excel

Question 2:

Compare and contrast the relationship between height and weight for males and females. To compare

and contrast the relationships by gender write about the pattern (direction, form, and strength) and any

deviations from the pattern (outliers) for each group.

Discuss how the patterns for the two groups are similar and how they are different.

gender

height

0

0

0

1

1

1

1

1

0

0

0

0

0

1

1

1

0

0

1

0

1

1

1

1

1

1

1

1

1

0

0

0

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

72

67

65

67

63

54

66

64

72

66

66

71

71

57

66

67

70

73

68

72

65

70

64

64

63

60

69

65

67

67

68

65

62

66

65

63

73

69

70

72

73

69

68

71

71

68

69

67

66

67

72

weight

155

145

125

120

105

120

125

125

160

133

175

205

175

82

125

133

175

163

133

180

107

170

110

140

110

110

125

120

180

120

140

130

122

114

115

125

195

135

145

170

172

168

155

185

175

158

185

146

135

150

160

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

68

75

68

73

72

72

72

72

74

68

73

68

70

72

70

67

67

71

72

73

68

72

68

67

70

71

70

67

58

56

155

230

149

240

170

198

163

230

170

151

220

145

130

160

210

145

185

237

205

147

170

181

150

150

200

175

155

167

100

100

Unit 3 Assignment 2: Linear Relationships

In this activity we will:

• Learn how to compute the correlation.

• Practice interpreting the value of the correlation.

• See an example of how including an outlier can increase the correlation.

Recall the following example: The average gestation period, or time of pregnancy, of an animal is closely

related to its longevity—the length of its lifespan. Data on the average gestation period and longevity (in

captivity) of 40 different species of animals have been recorded.

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

Remember that the correlation is only an appropriate measure of the linear relationship between two

quantitative variables. First produce a scatterplot to verify that gestation and longevity are nearly linear in

their relationship.

Instructions

Click on the link to see instructions for completing the activity, and then answer the questions below.

Here’s a reminder of how to do this in Excel:

•

•

Select all of the data in columns B and C, and then in the Insert tab choose Scatter in

the Charts group.

Choose the first scatterplot option (Scatter with only Markers).

Observe that the relationship between gestation period and longevity is linear and positive. Now we will

compute the correlation between gestation period and longevity.

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

To do that in Excel:

•

•

Click in a cell outside of the first three columns of data.

Type =correl(B2:B41,C2:C41)

Question 1:

Report the correlation between gestation and longevity and comment on the strength and direction of the

relationship. Interpret your findings in context.

Now return to the scatterplot that you created earlier. Notice that there is an outlier in both longevity (40

years) and gestation (645 days). Note: This outlier corresponds to the longevity and gestation period of

the elephant.

What do you think will happen to the correlation if we remove this outlier?

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

To do this in Excel:

•

•

•

Scroll down to row 16 of the data. You will see that this contains the values of

the variables for the elephant.

Click on the row header 16 to select the entire row of data.

Right-click and choose Delete from the pop-up menu to delete the row.

You will see that the values of the variables for the elephant have been removed from

the data. Notice also that the correlation between gestation and longevity has changed.

Question 2:

Report the new value for the correlation between gestation and longevity and compare it to the value you

found earlier when the outlier was included. What is it about this outlier that results in the fact that its

inclusion in the data causes the correlation to increase? (Hint: look at the scatterplot.)

Comment

In the last activity, we saw an example where there was a positive linear relationship between the two

variables, and including the outlier just “strengthened” it. Consider the hypothetical data displayed by the

following scatterplot:

In this case, the low outlier gives an “illusion” of a positive linear relationship, whereas in reality, there is

no linear relationship between X and Y.

animal

gestation longevity

baboon

187

20

bear, black

219

18

bear, grizzly

225

25

bear, polar

240

20

beaver

122

5

buffalo

278

15

camel

406

12

cat

63

12

chimpanzee

231

20

chipmunk

31

6

cow

284

15

deer

201

8

dog

61

12

donkey

365

12

elephant

645

40

elk

250

15

fox

52

7

giraffe

425

10

goat

151

8

gorilla

257

20

guinea pig

68

4

hippopotamus

238

25

horse

330

20

kangaroo

42

7

leopard

98

12

lion

100

15

monkey

164

15

moose

240

12

mouse

21

3

opossum

15

1

pig

112

10

puma

90

12

rabbit

31

5

rhinoceros

450

15

sea lion

350

12

sheep

154

12

squirrel

44

10

tiger

105

16

wolf

63

5

zebra

365

15

Year

1896

1900

1904

1908

1912

1920

1924

1928

1932

1936

1948

1952

1956

1960

1964

1968

1972

1976

1980

1984

1988

1992

1996

2000

Time

273.2

246

245.4

243.4

236.8

241.8

233.6

233.2

231.2

227.8

229.8

225.1

221.2

215.6

218.1

214.9

216.3

219.2

218.4

212.53

215.96

220.12

215.78

212.32

1 assignment . Excel Instructions: Scatterplot

As you can see, our dataset contains the following variables:

gender: 0 = male, 1 = female.

height: in inches.

weight: in pounds.

First we will create a scatterplot to examine how weight is related to height, ignoring gender.

To do that in Excel:

•

•

•

•

•

•

•

•

•

Sort the data by gender:

Hold down the Control key (Command key on MacOS) and click the A key to select all

of the data in the worksheet.

Select the Home tab, then the Editing group Sort & Filter -> Custom Sort.

In the pop-up window, make sure that My list has headers box is checked and then

choose gender from the pull-down menu next to Sort by. Click OK.

Now select all of the data in columns B and C, select the Insert tab and in

the Charts group choose Scatter.

Choose the first scatterplot option (Scatter with only Markers).

Now we have a scatterplot, but the data is all on the right of the plot. To fix this:

Right-click on the x-axis, and choose Format Axis from the pop-up menu.

Make sure that Axis Options is selected on the left, and then next

to Minimum enter 50 in the textbox. Click the X button to close the menu.

Unit 3 Assignment 1: Scatterplot

In this exercise we will:

• Learn how to create a scatterplot.

• Use the scatterplot to examine the relationship between two quantitative variables.

• Learn how to create a labeled scatterplot.

• Use the labeled scatterplot to better understand the form of a relationship.

In this activity we explore the relationship between weight and height for 81 adults. We will use height as

the explanatory variable. Weight is the response variable.

We will then label the men and women by adding the categorical variable gender to the scatterplot. We

will see if separating the groups contributes to our understanding of the form of the relationship between

height and weight.

Question 1:

Describe the relationship between the height and weight of the subjects. To describe the relationship

write about the pattern (direction, form, and strength) and any deviations from the pattern (outliers).

So far we have studied the relationship between height and weight for all of the males and females

together. It may be interesting to examine whether the relationship between height and weight is different

for males and females. To visualize the effect of the third variable, gender, we will indicate in the

scatterplot which observations are males and which are females.

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

Excel

Question 2:

Compare and contrast the relationship between height and weight for males and females. To compare

and contrast the relationships by gender write about the pattern (direction, form, and strength) and any

deviations from the pattern (outliers) for each group.

Discuss how the patterns for the two groups are similar and how they are different.

gender

height

0

0

0

1

1

1

1

1

0

0

0

0

0

1

1

1

0

0

1

0

1

1

1

1

1

1

1

1

1

0

0

0

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

72

67

65

67

63

54

66

64

72

66

66

71

71

57

66

67

70

73

68

72

65

70

64

64

63

60

69

65

67

67

68

65

62

66

65

63

73

69

70

72

73

69

68

71

71

68

69

67

66

67

72

weight

155

145

125

120

105

120

125

125

160

133

175

205

175

82

125

133

175

163

133

180

107

170

110

140

110

110

125

120

180

120

140

130

122

114

115

125

195

135

145

170

172

168

155

185

175

158

185

146

135

150

160

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

68

75

68

73

72

72

72

72

74

68

73

68

70

72

70

67

67

71

72

73

68

72

68

67

70

71

70

67

58

56

155

230

149

240

170

198

163

230

170

151

220

145

130

160

210

145

185

237

205

147

170

181

150

150

200

175

155

167

100

100

Unit 3 Assignment 2: Linear Relationships

In this activity we will:

• Learn how to compute the correlation.

• Practice interpreting the value of the correlation.

• See an example of how including an outlier can increase the correlation.

Recall the following example: The average gestation period, or time of pregnancy, of an animal is closely

related to its longevity—the length of its lifespan. Data on the average gestation period and longevity (in

captivity) of 40 different species of animals have been recorded.

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

Remember that the correlation is only an appropriate measure of the linear relationship between two

quantitative variables. First produce a scatterplot to verify that gestation and longevity are nearly linear in

their relationship.

Instructions

Click on the link to see instructions for completing the activity, and then answer the questions below.

Here’s a reminder of how to do this in Excel:

•

•

Select all of the data in columns B and C, and then in the Insert tab choose Scatter in

the Charts group.

Choose the first scatterplot option (Scatter with only Markers).

Observe that the relationship between gestation period and longevity is linear and positive. Now we will

compute the correlation between gestation period and longevity.

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

To do that in Excel:

•

•

Click in a cell outside of the first three columns of data.

Type =correl(B2:B41,C2:C41)

Question 1:

Report the correlation between gestation and longevity and comment on the strength and direction of the

relationship. Interpret your findings in context.

Now return to the scatterplot that you created earlier. Notice that there is an outlier in both longevity (40

years) and gestation (645 days). Note: This outlier corresponds to the longevity and gestation period of

the elephant.

What do you think will happen to the correlation if we remove this outlier?

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

To do this in Excel:

•

•

•

Scroll down to row 16 of the data. You will see that this contains the values of

the variables for the elephant.

Click on the row header 16 to select the entire row of data.

Right-click and choose Delete from the pop-up menu to delete the row.

You will see that the values of the variables for the elephant have been removed from

the data. Notice also that the correlation between gestation and longevity has changed.

Question 2:

Report the new value for the correlation between gestation and longevity and compare it to the value you

found earlier when the outlier was included. What is it about this outlier that results in the fact that its

inclusion in the data causes the correlation to increase? (Hint: look at the scatterplot.)

Comment

In the last activity, we saw an example where there was a positive linear relationship between the two

variables, and including the outlier just “strengthened” it. Consider the hypothetical data displayed by the

following scatterplot:

In this case, the low outlier gives an “illusion” of a positive linear relationship, whereas in reality, there is

no linear relationship between X and Y.

Unit 3 Assignment 3: Linear Regression

In this activity we will:

•

•

•

Find a regression line and plot it on the scatterplot.

Examine the effect of outliers on the regression line.

Use the regression line to make predictions and evaluate how reliable these predictions are.

Background

The modern Olympic Games have changed dramatically since their inception in 1896. For example, many

commentators have remarked on the change in the quality of athletic performances from year to year.

Regression will allow us to investigate the change in winning times for one event—the 1,500 meter race.

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

Excel Instructions: Linear Regression

To open Excel with the data in the worksheet, click and it will automatically

download the file to your computer. Find the downloaded file (usually in the

Downloads folder) and double-click it to open it in Excel.

Our dataset contains the following variables:

Year: the year of the Olympic Games, from 1896 to 2000.

Time: the winning time for the 1,500 meter race, in seconds.

First, let’s explore the relationship between the two quantitative variables—year and

time. Produce a scatterplot and use it to verify that year and time are nearly linear in

their relationship.

Here’s a reminder of how to do this in Excel:

•

•

•

•

Select all of the data in columns A and B, and then select the Insert tab and

choose Scatter in the Charts group next to the Recommended Charts button.

Choose the first scatterplot option (Scatter with only Markers).

Now we have a scatterplot, but the data is all towards the top of the plot. To fix

this:

Right-click on the Y axis, and choose Format Axis from the bottom of the

pop-up menu.

•

In the Format Axis window make sure that Axis Options button is selected,

find the Minimum option and enter 200 in the textbox. Click the X button to

close the menu window.

Observe that the form of the relationship between the 1,500 meter race’s winning time and the year is

linear. The least squares regression line is therefore an appropriate way to summarize the relationship

and examine the change in winning times over the course of the last century. We will now find the least

squares regression line and plot it on a scatterplot.

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

To do that in Excel:

•

•

•

Right-click on one of the points in the graph and choose Add Trendline from

the pop-up menu.

Make sure that the Trendline Options tab is selected in the Format

Trendline menu box displayed on the right, and then scroll down and check the

boxes next to Display Equation on chart and Display R-squared value on

chart.

Click the X button on the menu window to close it.

(You should see a graph window with the scatterplot and the regression line plotted on

it. The regression line equation appears on the graph next to the line itself.)

Question 1:

Give the equation for the least squares regression line, and interpret it in context.

Instructions

Click on the link corresponding to see instructions for completing the activity, and then answer the

questions below.

Our dataset contains the following variables:

Year: the year of the Olympic Games, from 1896 to 2000.

Time: the winning time for the 1,500 meter race, in seconds.

First, let’s explore the relationship between the two quantitative variables—year and

time. Produce a scatterplot and use it to verify that year and time are nearly linear in

their relationship.

Here’s a reminder of how to do this in Excel:

•

•

•

•

•

Select all of the data in columns A and B, and then select the Insert tab and

choose Scatter in the Charts group next to the Recommended Charts button.

Choose the first scatterplot option (Scatter with only Markers).

Now we have a scatterplot, but the data is all towards the top of the plot. To fix

this:

Right-click on the Y axis, and choose Format Axis from the bottom of the

pop-up menu.

In the Format Axis window make sure that Axis Options button is selected,

find the Minimum option and enter 200 in the textbox. Click the X button to

close the menu window.

Question 2:

Give the equation for this new line and compare it with the line you found for the whole dataset,

commenting on the effect of the outlier.

Question 3:

Our least squares regression line associates years as an explanatory variable, with times in the 1,500

meter race as the response variable. Use the least squares regression line you found in question 2 to

predict the 1,500 meter time in the 2008 Olympic Games in Beijing. Comment on your prediction.

animal

gestation longevity

baboon

187

20

bear, black

219

18

bear, grizzly

225

25

bear, polar

240

20

beaver

122

5

buffalo

278

15

camel

406

12

cat

63

12

chimpanzee

231

20

chipmunk

31

6

cow

284

15

deer

201

8

dog

61

12

donkey

365

12

elephant

645

40

elk

250

15

fox

52

7

giraffe

425

10

goat

151

8

gorilla

257

20

guinea pig

68

4

hippopotamus

238

25

horse

330

20

kangaroo

42

7

leopard

98

12

lion

100

15

monkey

164

15

moose

240

12

mouse

21

3

opossum

15

1

pig

112

10

puma

90

12

rabbit

31

5

rhinoceros

450

15

sea lion

350

12

sheep

154

12

squirrel

44

10

tiger

105

16

wolf

63

5

zebra

365

15

Year

1896

1900

1904

1908

1912

1920

1924

1928

1932

1936

1948

1952

1956

1960

1964

1968

1972

1976

1980

1984

1988

1992

1996

2000

Time

273.2

246

245.4

243.4

236.8

241.8

233.6

233.2

231.2

227.8

229.8

225.1

221.2

215.6

218.1

214.9

216.3

219.2

218.4

212.53

215.96

220.12

215.78

212.32