# NYU Regression Analysis STATA Worksheet

Please, answer every question. Note that Q3 has multiple parts. Use STATA and justify your answer.

datenum

9/30/2000

8/31/2000

7/31/2000

6/30/2000

5/31/2000

4/30/2000

3/31/2000

2/29/2000

1/31/2000

12/31/1999

11/30/1999

10/31/1999

9/30/1999

8/31/1999

7/31/1999

6/30/1999

5/31/1999

4/30/1999

3/31/1999

2/28/1999

1/31/1999

12/31/1998

11/30/1998

10/31/1998

9/30/1998

8/31/1998

7/31/1998

6/30/1998

5/31/1998

4/30/1998

3/31/1998

2/28/1998

1/31/1998

12/31/1997

11/30/1997

10/31/1997

9/30/1997

8/31/1997

7/31/1997

6/30/1997

5/31/1997

4/30/1997

3/31/1997

2/28/1997

1/31/1997

12/31/1996

datetext

09/2000

08/2000

07/2000

06/2000

05/2000

04/2000

03/2000

02/2000

01/2000

12/1999

11/1999

10/1999

09/1999

08/1999

07/1999

06/1999

05/1999

04/1999

03/1999

02/1999

01/1999

12/1998

11/1998

10/1998

09/1998

08/1998

07/1998

06/1998

05/1998

04/1998

03/1998

02/1998

01/1998

12/1997

11/1997

10/1997

09/1997

08/1997

07/1997

06/1997

05/1997

04/1997

03/1997

02/1997

01/1997

12/1996

sp_tr

riskfree

-5.30% 0.50%

6.19% 0.51%

-1.58% 0.49%

2.45% 0.48%

-2.07% 0.49%

-3.03% 0.47%

9.77% 0.48%

-1.91% 0.46%

-5.04% 0.44%

5.87% 0.44%

2.02% 0.42%

6.31% 0.40%

-2.76% 0.39%

-0.51% 0.40%

-3.14% 0.38%

5.53% 0.38%

-2.38% 0.38%

3.86% 0.36%

3.98% 0.37%

-3.12% 0.37%

4.16% 0.36%

5.75% 0.37%

6.04% 0.37%

8.12% 0.34%

6.39% 0.40%

-14.47% 0.41%

-1.08% 0.41%

4.05% 0.42%

-1.74% 0.42%

0.99% 0.42%

5.10% 0.42%

7.20% 0.43%

1.09% 0.42%

1.70% 0.43%

4.61% 0.43%

-3.36% 0.41%

5.46% 0.41%

-5.62% 0.43%

7.94% 0.42%

4.46% 0.41%

6.07% 0.43%

5.95% 0.43%

-4.13% 0.43%

0.77% 0.42%

6.23% 0.42%

-2.00% 0.41%

hedge

-0.41%

3.39%

0.11%

3.66%

-1.17%

-4.63%

-2.12%

6.49%

-0.10%

8.53%

4.96%

2.37%

-0.32%

-0.90%

0.26%

3.28%

0.13%

2.63%

1.22%

-1.31%

0.80%

3.03%

1.36%

-4.57%

-2.31%

-7.55%

0.90%

1.59%

0.26%

0.95%

5.94%

1.96%

-1.21%

3.22%

1.00%

-1.64%

3.99%

-1.26%

6.99%

2.26%

0.88%

2.86%

-1.41%

1.30%

5.48%

0.31%

exhedge

-0.91%

2.88%

-0.38%

3.18%

-1.66%

-5.10%

-2.60%

6.03%

-0.54%

8.09%

4.54%

1.97%

-0.71%

-1.30%

-0.12%

2.90%

-0.25%

2.27%

0.85%

-1.68%

0.44%

2.66%

0.99%

-4.91%

-2.71%

-7.96%

0.49%

1.17%

-0.16%

0.53%

5.52%

1.53%

-1.63%

2.79%

0.57%

-2.05%

3.58%

-1.69%

6.57%

1.85%

0.45%

2.43%

-1.84%

0.88%

5.06%

-0.10%

exsptr

-5.80%

5.69%

-2.07%

1.97%

-2.56%

-3.50%

9.29%

-2.37%

-5.49%

5.44%

1.59%

5.91%

-3.15%

-0.91%

-3.52%

5.15%

-2.75%

3.50%

3.61%

-3.50%

3.80%

5.38%

5.67%

7.78%

5.99%

-14.89%

-1.50%

3.63%

-2.15%

0.57%

4.69%

6.77%

0.67%

1.27%

4.18%

-3.77%

5.05%

-6.05%

7.52%

4.05%

5.64%

5.52%

-4.55%

0.35%

5.81%

-2.40%

exsptr1

5.69%

-2.07%

1.97%

-2.56%

-3.50%

9.29%

-2.37%

-5.49%

5.44%

1.59%

5.91%

-3.15%

-0.91%

-3.52%

5.15%

-2.75%

3.50%

3.61%

-3.50%

3.80%

5.38%

5.67%

7.78%

5.99%

-14.89%

-1.50%

3.63%

-2.15%

0.57%

4.69%

6.77%

0.67%

1.27%

4.18%

-3.77%

5.05%

-6.05%

7.52%

4.05%

5.64%

5.52%

-4.55%

0.35%

5.81%

-2.40%

7.12%

exsptr2

-2.07%

1.97%

-2.56%

-3.50%

9.29%

-2.37%

-5.49%

5.44%

1.59%

5.91%

-3.15%

-0.91%

-3.52%

5.15%

-2.75%

3.50%

3.61%

-3.50%

3.80%

5.38%

5.67%

7.78%

5.99%

-14.89%

-1.50%

3.63%

-2.15%

0.57%

4.69%

6.77%

0.67%

1.27%

4.18%

-3.77%

5.05%

-6.05%

7.52%

4.05%

5.64%

5.52%

-4.55%

0.35%

5.81%

-2.40%

7.12%

2.32%

11/30/1996

10/31/1996

9/30/1996

8/31/1996

7/31/1996

6/30/1996

5/31/1996

4/30/1996

3/31/1996

2/29/1996

1/31/1996

12/31/1995

11/30/1995

10/31/1995

9/30/1995

8/31/1995

7/31/1995

6/30/1995

5/31/1995

4/30/1995

3/31/1995

2/28/1995

1/31/1995

12/31/1994

11/30/1994

10/31/1994

9/30/1994

8/31/1994

7/31/1994

6/30/1994

5/31/1994

4/30/1994

3/31/1994

2/28/1994

1/31/1994

11/1996

10/1996

09/1996

08/1996

07/1996

06/1996

05/1996

04/1996

03/1996

02/1996

01/1996

12/1995

11/1995

10/1995

09/1995

08/1995

07/1995

06/1995

05/1995

04/1995

03/1995

02/1995

01/1995

12/1994

11/1994

10/1994

09/1994

08/1994

07/1994

06/1994

05/1994

04/1994

03/1994

02/1994

01/1994

7.54%

2.74%

5.61%

2.09%

-4.43%

0.36%

2.56%

1.46%

0.95%

0.91%

3.39%

1.91%

4.37%

-0.37%

4.20%

0.23%

3.30%

2.31%

3.98%

2.93%

2.93%

3.88%

2.58%

1.47%

-3.66%

2.23%

-2.46%

4.08%

3.27%

-2.47%

1.62%

1.27%

-4.38%

-2.73%

3.38%

0.42%

0.42%

0.43%

0.42%

0.43%

0.43%

0.42%

0.42%

0.41%

0.41%

0.42%

0.43%

0.45%

0.44%

0.44%

0.45%

0.46%

0.46%

0.47%

0.47%

0.48%

0.48%

0.48%

0.47%

0.44%

0.41%

0.39%

0.37%

0.36%

0.35%

0.35%

0.31%

0.29%

0.27%

0.25%

5.04%

3.35%

2.50%

2.48%

-4.13%

1.74%

1.93%

3.12%

1.06%

-3.59%

6.97%

3.06%

3.14%

-0.11%

0.38%

6.00%

2.76%

0.37%

1.01%

1.45%

3.48%

0.45%

-1.97%

-0.21%

0.41%

-1.35%

0.67%

2.77%

0.35%

-0.81%

2.23%

-1.74%

-3.57%

-4.09%

1.14%

4.62%

2.93%

2.07%

2.06%

-4.56%

1.31%

1.51%

2.70%

0.65%

-4.00%

6.55%

2.63%

2.69%

-0.55%

-0.06%

5.55%

2.30%

-0.09%

0.54%

0.98%

3.00%

-0.03%

-2.45%

-0.68%

-0.03%

-1.76%

0.28%

2.40%

-0.01%

-1.16%

1.88%

-2.05%

-3.86%

-4.36%

0.89%

7.12%

2.32%

5.18%

1.67%

-4.87%

-0.06%

2.14%

1.04%

0.53%

0.50%

2.97%

1.48%

3.93%

-0.82%

3.76%

-0.22%

2.84%

1.85%

3.51%

2.46%

2.46%

3.40%

2.09%

1.00%

-4.10%

1.82%

-2.85%

3.71%

2.90%

-2.82%

1.28%

0.96%

-4.67%

-3.00%

3.13%

2.32%

5.18%

1.67%

-4.87%

-0.06%

2.14%

1.04%

0.53%

0.50%

2.97%

1.48%

3.93%

-0.82%

3.76%

-0.22%

2.84%

1.85%

3.51%

2.46%

2.46%

3.40%

2.09%

1.00%

-4.10%

1.82%

-2.85%

3.71%

2.90%

-2.82%

1.28%

0.96%

-4.67%

-3.00%

3.13%

0.94%

5.18%

1.67%

-4.87%

-0.06%

2.14%

1.04%

0.53%

0.50%

2.97%

1.48%

3.93%

-0.82%

3.76%

-0.22%

2.84%

1.85%

3.51%

2.46%

2.46%

3.40%

2.09%

1.00%

-4.10%

1.82%

-2.85%

3.71%

2.90%

-2.82%

1.28%

0.96%

-4.67%

-3.00%

3.13%

0.94%

-1.23%

exsptr3

1.97%

-2.56%

-3.50%

9.29%

-2.37%

-5.49%

5.44%

1.59%

5.91%

-3.15%

-0.91%

-3.52%

5.15%

-2.75%

3.50%

3.61%

-3.50%

3.80%

5.38%

5.67%

7.78%

5.99%

-14.89%

-1.50%

3.63%

-2.15%

0.57%

4.69%

6.77%

0.67%

1.27%

4.18%

-3.77%

5.05%

-6.05%

7.52%

4.05%

5.64%

5.52%

-4.55%

0.35%

5.81%

-2.40%

7.12%

2.32%

5.18%

1.67%

-4.87%

-0.06%

2.14%

1.04%

0.53%

0.50%

2.97%

1.48%

3.93%

-0.82%

3.76%

-0.22%

2.84%

1.85%

3.51%

2.46%

2.46%

3.40%

2.09%

1.00%

-4.10%

1.82%

-2.85%

3.71%

2.90%

-2.82%

1.28%

0.96%

-4.67%

-3.00%

3.13%

0.94%

-1.23%

1.80%

b) [6 points] Devise a naming scheme for eight more variables. These will be for two groups of

interactions. The first group will be four interaction variables for up markets. For the first

variable in this group, multiply the contemporaneous excess market return (exsptr) times the

up-market indicator for the contemporaneous month. For the next variable in this group,

multiply the first lag excess market return (exsptrl) times the up-market indicator for the first

lag. For the next variable, multiply the second lag excess market return (exsptr2) times the

up-market indicator for the second lag. Follow the same process for the remaining up market

variable.

Follow a similar process for the down-market interaction variables. For the first variable in

this group, multiply the contemporaneous excess market return (exsptr) times the down-

market indicator for the contemporaneous month. For the next variable in this group,

multiply the first lag excess market return (exsptrl) times the down-market indicator for the

first lag. Create the remaining two variables similarly.

These eight new variables should now be populated with a mix of zeros and return values.

Only positive return values (along with the zeros) should appear in the up-market interaction

variables. Only negative return numbers (along with the zeros) should appear in the down-

market interaction variables. For credit for this question part, summarize, describe, inspect,

view or otherwise show that you have created these data. As a quantitative reference point,

what is the value for September 2000 (top of the list) of the third lag for down markets – a 0

or a negative return value?

c) [6 points] Run a regression (robust) of the excess hedge fund returns (exhedge) on the eight

interaction terms that you created. In addition to supplying your regression output, list the

six numeric values of the items below for grader review. Also answer this question: Do your

results essentially match those of the top row of exhibit 5 of page 14 (pdf page 9) of the

article?

Up Markets

Contemp. Beta

Sum of lag 1, lag 2 and lag 3 betas

Sum of all four betas

Down Markets

Contemp. Beta

Sum of lag 1, lag 2 and lag 3 betas

Sum of all four betas

d) [6 points] Examine the confidence intervals calculated by the software for the two

contemporaneous betas. Do the confidence intervals overlap? Using that information, and

making an educated guess about the summed betas for up markets and for down markets,

does it seem as though the hedge fund managers might be employing a different asset pricing

style in up markets versus down markets?

e) [6 points] Submit a PDF file, or similar Gradescope-acceptable format, copy of your code.

Problem 3 [30 points) In the questions below, you will be asked to further investigate the hedge

fund return dataset that you used in the previous two problem sets. Again you are asked for two

deliverables. The first deliverable will be your answer sheet with verbal answers to some

question parts and copies of the relevant portion of your Stata or R command and results in other

question parts. The second deliverable will be a PDF copy (or legibly clear pictures if need be)

of the .do or .R file that you create. You may choose to clean up this .do or R file before

submitting the PDF copy of it, but you should ensure that what you submit, and save, actually

executes properly.

As a reminder, the authors of Do Hedge Funds Hedge? are exploring the notion that hedge fund

managers price their securities in a way that makes their risk appear lower than the true riskiness.

Because managers may have some pricing flexibility, perhaps they price differently when the

market goes up than when the market goes down. You will “divide” the data into observations

when the excess S&P return is positive, up markets, and when it is negative, down markets, and

calculate the beta for each group.

a) [6 points] Start with the data supplied with this problem set, which reflects the work through

problem set 3. The goal of this step is to create eight new indicator variables. Devise a

naming scheme such that each variable name reflects two pieces of information, 1) whether it

relates to up markets or down markets, and 2) whether it is for the contemporaneous month,

one month prior, two months prior or three months prior. Thus, there will be four new

variables relating to up markets (contemporaneous, 1 lag, 2 lag, 3 lag) and four relating to

down markets.

The data names of the contemporaneous and lagged excess market returns in the supplied

dataset are,

exsptr

exsptr 1

exsptr2

exsptr3

For the first of the eight variables that you create, focus on the contemporaneous excess

market return. Using some type of an “if” statement in your code, value the new variable as

“1” if the excess market return (exsptr) is positive, and as a “O” if it is zero or negative. For

the second new variable that you create, focus on the first lagged excess market return

(exsptrl). Value the new variable as 1 if the lagged excess market return is positive, and 0 if

it is zero or negative. Create the next two variables similarly, focusing on the next two lags

of the excess market return.

The other four new variables must be populated with the opposite coding. In other words,

these variables are valued as 1 if the excess market had a return of zero or was down during

the month. They are coded as 0 if the market had a positive excess return. These are the

indicators for a down market.

For credit for this question part, summarize, describe, inspect, view or otherwise show that

you have created these data. As a quantitative reference point, what is the value for

September 2000 (top of the list) of the third lag for down markets – a 0 or a 1?

Problem 1 [35 points] You are studying firms in an industry in which the production function is

Y = KºLBU,

for some values of a and B. You have data on prices (cost of capital Pk, wage rate Pi, price of

output py), inputs (capital K and labor L), and output (Y) from a random sample of firms. You

may assume that the unobservable U is always positive and is independent of prices and inputs.

(a) [7 points] How would you obtain estimates of a and B? Describe the steps that you would

take if you had access to this dataset. What variables would you construct? What model

would you estimate? What are the coefficients of interest?

(b) [7 points] You want to test the null hypothesis that the production function exhibits

increasing returns to scale or constant returns to scale against the alternative hypothesis that

it does not. Express the null hypothesis in terms of a and B. How would you test this

hypothesis? In other words, what test statistic would you construct and when would you

reject the null hypothesis at the 5% significance level?

(c) [Information] Assume the constant returns to scale. Economic theory predicts that

expenditure shares are constant. In particular, when inputs are optimally chosen and there

are constant returns to scale, then

PKK

= a.

PyY

PyY

(d) [7 points] An implication is that pkk does not depend on the price of labor. In non-Cobb-

Douglas production functions, in contrast, the capital expenditure share

PKK

would in

PyY

general depend on the price of labor. How would you use this idea to formally test the

hypothesis that firms maximize a constant-returns-to-scale Cobb-Douglas production

function? What specification would you estimate? What test statistic would you construct?

When would you reject the null hypothesis at the 1% significance level?

(e) [7 points] Suppose that, when the firm produces some amount Y, you observe Y = Y XV

where V is independent of (Y, K, L, U, Pk, Pu, Py) and has mean 1. You do not observe the

true Y. How would this affect your estimates in part (a)? In particular, would your estimates

of a and ß be biased?

(f) [7 points] Suppose that, when the firm uses an amount K of captial, you observe Ñ = K x a

where V is independent of (Y, K, L, U,Pk, Pupy) and has mean 1, but you do not observe the

true K. How would this affect your estimates in part (a)? In particular, would your estimates

of a and ß be biased?

Problem 2 [35 points) In this problem we will try and work towards understanding whether

government-subsidized savings accounts help people save towards retirement, and if so, by how

much. We’ll do this using the 401ksubs.dta dataset attached to this problem set. This is a dataset

of a cross-section of individuals and includes information on basic demographics, their income

and wealth, and whether they participate in a 401(k) account.

a) [7 points] Start by running a naïve regression. Regress net total assets on the dummy variable

indicating whether the respondent has a 401(k) account. Interpret the sign and magnitude of

the coefficient. Can you give this estimate a causal interpretation? Why (not)?

b) [7 points] Now add in the dummy for eligibility for a 401(k) account and interpret the

coefficient (Hint: Can you have a 401(k) account if you are not eligible?]. Does the

coefficient on eligibility imply that being eligible for a 401(k) lowers savings? Why (not)?

What omitted factors do you think are being picked up here?

c) [7 points] Now let’s drop eligibility from the regression, but let’s add in a set of controls.

Add in the dummy for IRA participation, age, age squared, family size, income, income

squared, the male dummy, and the marriage dummy. Interpret five of the coefficients. How

does the coefficient on p401k change? Now do you think you can interpret the coefficient on

p401k as causal? Why (not)?

d) [7 points) Let’s explore the possibility that the controls matter differently for men than for

women. Run a regression of net total assets on the dummy for 401(k) participation and then all

the controls as well as their interactions with the male dummy. Interpret the coefficient on

p401k and the interaction with the male dummy for two of the controls. Test whether all of the

interactions of the controls with the male dummy are jointly significant. How does this change

whether you think the coefficient on p401k is causal?

e) [7 points] Finally, let’s see whether 401(k) participation affects savings differentially for men

vs women. Run the regression from part d) but also interact the 401k participation dummy

with the male dummy. What does this regression imply is the effect on savings of 401k

participation for women? For men? Test whether the effect for men is = 0. Test whether the

effect for women is = 0. Test whether the effects is the same for men as for women.

Incidentally, you can access this, and all the other datasets used in the Stock & Watson

(https://fmwww.bc.edu/ec-p/data/stockwatson/datasets.list.html) and Wooldridge

(http://fmwww.bc.edu/ec-p/data/wooldridge/datasets.list.html) textbooks through Boston

College. They even set up a nice stata command that lets you read them straight into stata called

bcuse. To install it type “ssc install bcuse” into stata. Then you can load this dataset by typing

“bcuse 401ksubs.dta, clear”