Exercises 9

Exercises

Time for Series

Ralf Martin https://mondpanther.github.io/wwwmondpanther/
2021-11-23

Exercise 9.1

The oj.csv dataset contains monthly data on the log price of (frozen) orange juice concentrate (for the US) as the variable lnp. A large part of the orange production in the US originates from Florida. Hence, the weather in Florida is potentially an important factor in the orange price. Frost is rare in Florida. But when it happens it is particularly detrimental for the orange harvest. The fdd contains the number of freezing degree days in a month.

oj=read.csv("https://github.com/mondpanther/datastorieshub/raw/master/data/oj.csv")  
oj=oj %>% mutate(date=as_date(date))
    
#library(ggplot2)
#library(lubridate)
#ggplot(oj,aes(x=date,y=lnp))+geom_line(color="green")+theme_minimal()+xlab("Monthly data")
  1. Run a regression of the (log) orange juice price on freezing degree days in the previous month and interpret the regression.

To create a lagged version of the fdd variable we can use the dplyr::lag() function:

library(dplyr)
oj=oj %>% mutate(L1fdd=dplyr::lag(fdd))

Now run a regression of the price on freezing degree days:

lm(lnp~L1fdd,oj) %>% summary()

Call:
lm(formula = lnp ~ L1fdd, data = oj)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.45826 -0.17130 -0.00483  0.12521  0.60393 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 4.674738   0.009079 514.916   <2e-16 ***
L1fdd       0.002637   0.002708   0.974    0.331    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2257 on 638 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.001484,  Adjusted R-squared:  -8.083e-05 
F-statistic: 0.9484 on 1 and 638 DF,  p-value: 0.3305

This suggest that freezing degree days have a positive impact on the price (as we would expect: freezing means there are less oranges around so the price goes up). One day more of freezing would imply an increase in the price by 0.2%. However, this result is not significant.

  1. Can you suggest reasons why the result in (a) might not be an unbiased estimate of the effect of freezing on orange juice prices?

Freezing is likely to be exogenous. So we don’t have to worry about the usual confounding factors. However, we are dealing with time series. One issue could be that the price series has a unit root. Plotting the series is a good start to examine this:

p=ggplot(oj,aes(x=date,y=lnp))+geom_line(color="green")+theme_minimal()+xlab("Monthly data")


plot(p)

  1. Can you check if the price series has a unit root?

We can use the Dickey-Fuller test:

library(urca)
summary(ur.df(oj$lnp))

############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression none 


Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.28240 -0.01007 -0.00124  0.00659  0.41180 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
z.lag.1    -0.0002675  0.0004203  -0.636  0.52480   
z.diff.lag  0.1284067  0.0392853   3.269  0.00114 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04975 on 637 degrees of freedom
Multiple R-squared:  0.01722,   Adjusted R-squared:  0.01413 
F-statistic:  5.58 on 2 and 637 DF,  p-value: 0.003957


Value of test-statistic is: -0.6363 

Critical values for test statistics: 
      1pct  5pct 10pct
tau1 -2.58 -1.95 -1.62

The test statistic is larger than the even the 10pct cut-off. Hence we cannot reject the hypothesis that there is a unit root.

  1. Can you suggest an alternative (unbiased) approach to estimating the effect of freezing on price?

We need to difference the series to get rid of the unit root. Note that for the differenced series we clearly reject the unit root:

ur.df(diff(oj$lnp)) %>% summary()

############################################### 
# Augmented Dickey-Fuller Test Unit Root Test # 
############################################### 

Test regression none 


Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.28373 -0.01140 -0.00246  0.00605  0.41061 

Coefficients:
           Estimate Std. Error t value Pr(>|t|)    
z.lag.1    -0.85301    0.05233 -16.300   <2e-16 ***
z.diff.lag -0.02086    0.03965  -0.526    0.599    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04979 on 636 degrees of freedom
Multiple R-squared:  0.4358,    Adjusted R-squared:  0.434 
F-statistic: 245.7 on 2 and 636 DF,  p-value: < 2.2e-16


Value of test-statistic is: -16.3003 

Critical values for test statistics: 
      1pct  5pct 10pct
tau1 -2.58 -1.95 -1.62
oj=oj %>% mutate(Dlnp=lnp-dplyr::lag(lnp))%>% mutate(t=1:n())
lm( Dlnp~L1fdd,oj) %>% summary()

Call:
lm(formula = Dlnp ~ L1fdd, data = oj)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.28200 -0.01001 -0.00095  0.00560  0.41224 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.0020334  0.0020063  -1.014    0.311  
L1fdd        0.0014905  0.0005984   2.491    0.013 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04989 on 638 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.009629,  Adjusted R-squared:  0.008077 
F-statistic: 6.203 on 1 and 638 DF,  p-value: 0.01301
lm( Dlnp~L1fdd+t,oj) %>% summary()

Call:
lm(formula = Dlnp ~ L1fdd + t, data = oj)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.28230 -0.00966 -0.00109  0.00568  0.41178 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept) -9.885e-04  3.972e-03  -0.249   0.8035  
L1fdd        1.494e-03  5.990e-04   2.494   0.0129 *
t           -3.257e-06  1.068e-05  -0.305   0.7606  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04992 on 637 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.009773,  Adjusted R-squared:  0.006664 
F-statistic: 3.144 on 2 and 637 DF,  p-value: 0.0438

We now find a smaller effect than before: 1 extra freezing day leads to 0.149% higher orange juice prices (i.e. we previously had an upward bias). However, the result is now significant.

Citation

For attribution, please cite this work as

Martin (2021, Nov. 23). Datastories Hub: Exercises 9. Retrieved from https://mondpanther.github.io/datastorieshub/posts/exercises/exercises9/

BibTeX citation

@misc{martin2021exercises,
  author = {Martin, Ralf},
  title = {Datastories Hub: Exercises 9},
  url = {https://mondpanther.github.io/datastorieshub/posts/exercises/exercises9/},
  year = {2021}
}