Chapter
3 Add.
Additional comments about fitting ARIMA models and forecasting with the resulting model
Random walk model
A random walk model is discussed on p. 14 of Shumway and Stoffer. The model is simply an ARIMA(0,1,0):
xt = xt-1 + wt where wt ~ ind. N(0, )
The name comes about through xt being xt-1 plus a random “movement”. In its more general form, wt~white noise.
A drift term, , can be added to the model,
xt = + xt-1 + wt
This can be rewritten then as a cumulative sum,
xt = + ( + xt-2 + wt-1) + wt = 2 + xt-2 + wt-1 + wt
= 2 + ( + xt-3 + wt-2) + wt-1 + wt = 3 + xt-3 + wt-2 +
wt-1 + wt
xt = t +
One can see that the drift term creates a trend in the series (i.e., allows xt to drift away from 0). Notice the random walk model with a drift term is not stationary.
Shumway and Stoffer’s issues with R and time series
See #2 and #3 at http://www.stat.pitt.edu/stoffer/tsa3/
Rissues.htm.
When arima() is used with differencing, the include.mean = option is set to FALSE. This is because differencing removes the trend and allows for the differenced series to have a mean of 0. Note that even if you say include.mean = TRUE, there will be no “mean” estimated. Here’s what R’s arima() help says for the include.mean option,
include.mean |
Should the ARIMA model include a mean term? The default is TRUE for undifferenced series, FALSE for differenced ones (where a mean would not affect the fit nor predictions). |
As shown in the random walk model example, there may be times where you do want to estimate a drift term, like , still. This leads to an ARIMA model with the drift term,
(1-1B-2B2-…-pBp)xt = + (1+1B+2B2+…+qBq)wt
(B)xt = + (B)wt
Most time series textbooks will not include the constant term in the model when there is differencing. For example, Wei (1990, p. 71-72) says (referring to as 0),
We assume that 0 = 0, unless it is clear from the data or the nature of the problem that a deterministic trend is really needed.
This is why R does not include the possible estimation of in the arima() function.
If you did want to include , Shumway and Stoffer show in their textbook and web page a few ways to estimate it. These methods are described below:
Use xreg = 1:length(x), where x contains the data in the arima().
Use Shumway and Stoffer’s sarima() and sarima.for() functions for fitting the model and forecasting. These are available in tsa3.rda. The sarima() name comes from an extension of the ARIMA model that will be discussed in Section 3.9.
Do all differencing needed BEFORE invoking arima(). Use d = 0 in arima() and include.mean = TRUE.
Below are examples of their implementations. Also, see Example 3.38 in Shumway and Stoffer for when was needed.
Example: ARIMA(1,1,1) with 1 = 0.7, 1 = 0.4, = 9, n = 200 (arima111_sim.R, arima111.xls)
Of course, should be 0 for this data set since the data were simulated directly from an ARIMA(1,1,1). However, we can still investigate what would happen if was estimated.
First, the original model’s fit has been reproduced below.
> library(RODBC)
> z<-odbcConnectExcel("C:\\chris\\UNL\\STAT_time_series\\
chapter3\\arima111.xls")
> arima111<-sqlFetch(z, "Sheet1")
> close(z)
> x<-arima111$x
> mod.fit<-arima(x = x, order = c(1, 1, 1))
> mod.fit
Call:
arima(x = x, order = c(1, 1, 1))
Coefficients:
ar1 ma1
0.6720 0.4681
s.e. 0.0637 0.0904
sigma^2 estimated as 9.558: log likelihood = -507.68, aic = 1021.36
> #Forecasts 5 time periods into the future
> fore.mod<-predict(object = mod.fit, n.ahead = 5, se.fit
= TRUE)
> fore.mod
$pred
Time Series:
Start = 201
End = 205
Frequency = 1
[1] -486.3614 -484.9361 -483.9784 -483.3348 -482.9023
$se
Time Series:
Start = 201
End = 205
Frequency = 1
[1] 3.091673 7.303206 11.578890 15.682551 19.534208
The estimated model is
(1 0.6720B)(1 B)xt = (1+0.4681B)wt with = 9.56
Now, the models are fit including an estimate of .
Using xreg = 1:length(x) in arima() where x contains the data, produces the following,
> mod.fit2<-arima(x = x, order = c(1, 1, 1), xreg =
1:length(x))
> mod.fit2
Call:
arima(x = x, order = c(1, 1, 1), xreg = 1:length(x))
Coefficients:
ar1 ma1 1:length(x)
0.6381 0.4825 -1.7204
s.e. 0.0671 0.0894 0.8814
sigma^2 estimated as 9.404: log likelihood = -506.03, aic = 1020.05
> fore.mod2<-predict(object = mod.fit2, n.ahead = 5,
se.fit = TRUE, newxreg = (length(x)+1):(length(x)+5))
> fore.mod2
$pred
Time Series:
Start = 201
End = 205
Frequency = 1
[1] -486.7411 -486.2527 -486.5636 -487.3846 -488.5312
$se
Time Series:
Start = 201
End = 205
Frequency = 1
[1] 3.066621 7.189985 11.283559 15.140585 18.707819
> mod.fit2$coef[3]/sqrt(mod.fit2$var.coef[3,3])
1:length(x)
-1.952
Note the newxreg option used in predict() and the values that I put into it. The estimated model is
(1 0.6381B)(1 B)xt = -1.7204 + (1 + 0.4825B)wt with = 9.404
A Wald test was performed for Ho: = 0 vs. Ha: 0, and, surprisingly, is marginally significant. This is an example of a type I error!
Using Shumway and Stoffer’s functions produces,
> #First, R needs to read in the functions
>
load("C:\\chris\\unl\\Dropbox\\NEW\\STAT_time_series\\
tsa3.rda")
> mod.fit.SS<-sarima(xdata = x, p = 1, d = 1, q = 1)
> mod.fit.SS
$fit
Call:
arima(x = data, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), xreg = constant, include.mean = F)
Coefficients:
ar1 ma1 constant
0.6381 0.4825 -1.7204
s.e. 0.0671 0.0894 0.8814
sigma^2 estimated as 9.404: log likelihood = -506.03, aic = 1020.05
$AIC
[1] 3.271153
$AICc
[1] 3.282178
$BIC
[1] 2.320628
> names(mod.fit.SS)
[1] "fit" "AIC" "AICc" "BIC"
> mod.fit.SS$fit$coef
ar1 ma1 constant
0.6381129 0.4825299 -1.7204478
> mod.fit.SS$fit$var.coef
ar1 ma1 constant
ar1 0.0045019721 -0.0035447116 -0.0002140191
ma1 -0.0035447116 0.0079866617 0.0003867202
constant -0.0002140191 0.0003867202 0.7768252171
> mod.fit.SS$fit$coef[3]/
sqrt(mod.fit.SS$fit$var.coef[3,3])
constant
-1.952
The plot above that includes information about the residuals will be discussed in Section 3.8. The estimated model is
(1 0.6381B)(1 B)xt = -1.7204 + (1 + 0.4825B)wt with = 9.404
The forecasts are shown below.
> save.for<-sarima.for(xdata = x, n.ahead = 5, p = 1, d =
1, q = 1)
> save.for
$pred
Time Series:
Start = 201
End = 205
Frequency = 1
[1] -486.7411 -486.2527 -486.5636 -487.3846 -488.5312
$se
Time Series:
Start = 201
End = 205
Frequency = 1
[1] 3.066621 7.189985 11.283559 15.140585 18.707819
Unfortunately, Shumway and Stoffer do not allow for additional information to be passed into their functions. For example, you can not use the main = option that we have used to put a title on a plot. Instead, you would need to use a separate call to the title() function.
Do all differencing needed BEFORE invoking arima(). Use d = 0 in arima() and include.mean = TRUE. See the program for partial implementation of this method. #1 and #2 are easier so these are the only methods to be used in this class.
2011 Christopher R. Bilder
CONFIGURING USER STATE MANAGEMENT FEATURES 73 CHAPTER 7 IMPLEMENTING
INTERPOLATION 41 CHAPTER 5 INTERPOLATION THIS CHAPTER SUMMARIZES POLYNOMIAL
PREPARING FOR PRODUCTION DEPLOYMENT 219 CHAPTER 4 DESIGNING A
Tags: about fitting, information about, about, add10, arima, fitting, models, chapter, additional, comments