Time for Series
The oj.csv
dataset contains monthly data on the log price of (frozen) orange juice concentrate (for the US) as the variable lnp
. A large part of the orange production in the US originates from Florida. Hence, the weather in Florida is potentially an important factor in the orange price. Frost is rare in Florida. But when it happens it is particularly detrimental for the orange harvest. The fdd
contains the number of freezing degree days in a month.
oj=read.csv("https://github.com/mondpanther/datastorieshub/raw/master/data/oj.csv")
oj=oj %>% mutate(date=as_date(date))
#library(ggplot2)
#library(lubridate)
#ggplot(oj,aes(x=date,y=lnp))+geom_line(color="green")+theme_minimal()+xlab("Monthly data")
To create a lagged version of the fdd
variable we can use the dplyr::lag()
function:
Now run a regression of the price on freezing degree days:
Call:
lm(formula = lnp ~ L1fdd, data = oj)
Residuals:
Min 1Q Median 3Q Max
-0.45826 -0.17130 -0.00483 0.12521 0.60393
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.674738 0.009079 514.916 <2e-16 ***
L1fdd 0.002637 0.002708 0.974 0.331
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2257 on 638 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.001484, Adjusted R-squared: -8.083e-05
F-statistic: 0.9484 on 1 and 638 DF, p-value: 0.3305
This suggest that freezing degree days have a positive impact on the price (as we would expect: freezing means there are less oranges around so the price goes up). One day more of freezing would imply an increase in the price by 0.2%. However, this result is not significant.
Freezing is likely to be exogenous. So we don’t have to worry about the usual confounding factors. However, we are dealing with time series. One issue could be that the price series has a unit root. Plotting the series is a good start to examine this:
We can use the Dickey-Fuller test:
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression none
Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-0.28240 -0.01007 -0.00124 0.00659 0.41180
Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1 -0.0002675 0.0004203 -0.636 0.52480
z.diff.lag 0.1284067 0.0392853 3.269 0.00114 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04975 on 637 degrees of freedom
Multiple R-squared: 0.01722, Adjusted R-squared: 0.01413
F-statistic: 5.58 on 2 and 637 DF, p-value: 0.003957
Value of test-statistic is: -0.6363
Critical values for test statistics:
1pct 5pct 10pct
tau1 -2.58 -1.95 -1.62
The test statistic is larger than the even the 10pct cut-off. Hence we cannot reject the hypothesis that there is a unit root.
We need to difference the series to get rid of the unit root. Note that for the differenced series we clearly reject the unit root:
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression none
Call:
lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-0.28373 -0.01140 -0.00246 0.00605 0.41061
Coefficients:
Estimate Std. Error t value Pr(>|t|)
z.lag.1 -0.85301 0.05233 -16.300 <2e-16 ***
z.diff.lag -0.02086 0.03965 -0.526 0.599
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04979 on 636 degrees of freedom
Multiple R-squared: 0.4358, Adjusted R-squared: 0.434
F-statistic: 245.7 on 2 and 636 DF, p-value: < 2.2e-16
Value of test-statistic is: -16.3003
Critical values for test statistics:
1pct 5pct 10pct
tau1 -2.58 -1.95 -1.62
Call:
lm(formula = Dlnp ~ L1fdd, data = oj)
Residuals:
Min 1Q Median 3Q Max
-0.28200 -0.01001 -0.00095 0.00560 0.41224
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0020334 0.0020063 -1.014 0.311
L1fdd 0.0014905 0.0005984 2.491 0.013 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04989 on 638 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.009629, Adjusted R-squared: 0.008077
F-statistic: 6.203 on 1 and 638 DF, p-value: 0.01301
Call:
lm(formula = Dlnp ~ L1fdd + t, data = oj)
Residuals:
Min 1Q Median 3Q Max
-0.28230 -0.00966 -0.00109 0.00568 0.41178
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.885e-04 3.972e-03 -0.249 0.8035
L1fdd 1.494e-03 5.990e-04 2.494 0.0129 *
t -3.257e-06 1.068e-05 -0.305 0.7606
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04992 on 637 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.009773, Adjusted R-squared: 0.006664
F-statistic: 3.144 on 2 and 637 DF, p-value: 0.0438
We now find a smaller effect than before: 1 extra freezing day leads to 0.149% higher orange juice prices (i.e. we previously had an upward bias). However, the result is now significant.
For attribution, please cite this work as
Martin (2021, Nov. 23). Datastories Hub: Exercises 9. Retrieved from https://mondpanther.github.io/datastorieshub/posts/exercises/exercises9/
BibTeX citation
@misc{martin2021exercises, author = {Martin, Ralf}, title = {Datastories Hub: Exercises 9}, url = {https://mondpanther.github.io/datastorieshub/posts/exercises/exercises9/}, year = {2021} }