As promised, this is the second post on my two part blog series on time series modelling and forecasting. In my first blog post I discussed the basics of time series analysis and gave a theoretical overview. In case you missed it you can find it here – Understanding Time Series Modelling and Forecasting, Part 1
Table of Contents
- Identifying a Possible Model
- Diagnosing a Selected Model
- Forecasting With ARIMA Models
- Seasonal ARIMA Models
Identifying a Possible Model
I have talked about this in detail in my previous blog post. Lets go over this briefly once again. There are three things that need to be considered to make a first guess.
- The time series plot of the observed series – Look for seasonality and trend. Ensure that the plot represents a stationary time series. If not, look for trends and use differencing to detrend your series. If there is curved upward trend along with increasing variance, consider transforming the series with a logarithm or square root. Non-constant variance with no trend need to be dealt with ARCH models (I have not discussed ARCH models here).
- The ACF plot of the series
- The PACF plot of the series
I have discussed this in my previous blog post. But I would like to mention a very important table again that I often refer.
|ACF||Tails Off||Cuts off after lag q||Tails Off|
|PACF||Cuts off after lag p||Tails Off||Tails Off|
Diagnosing a selected model
Now that we have decided a particular model to use, we need to estimate the coefficients of our model. You usually do not have to worry about it as R or any other statistical software would do this for you. Most of the software packages use maximum likelihood estimation method to make the estimates.
Once you get the coefficients you need to consider a few things –
- Check the significance of the coefficients. You can calculate the t-statistics or the p-values of the coefficients in R.
- Examine the ACF plot of the residuals. A good model will have all autocorrelations of the residuals non – significant. You need to reconsider the order of your ARIMA model if this is not the case.
- Use Box-Pierce (Ljung) tests for various possible residual autocorrelation at various lags. We will see how we can do this in the ARIMA modelling in R section.
- If you are concerned about the non – constant variance, you need to again examine the residuals vs fits and the time series plot of the residuals.
If any of the checks bothers you, revise your guess of the model selected. You might have to change the order of the ARIMA model.
What if more than one model looks okay?
This happens very often. You will be in a position where once you perform the above steps, more than one model would seem to work. Here are the few steps you can take –
- The model with lesser parameters should be preferred.
- Examine the standard errors of the forecast values. Pick the model with the lowest standard errors.
- Use statistics such as the MSE (mean square error), AIC, AICc or BIC. Lower values of these statistics are desirable.
Forecasting with ARIMA models
When we forecast a value past the end of the series, on the right side of the equation we might need values from the observed series that aren’t yet observed. Again, statistical software like R would do this for you but let us discuss the basic steps involved.
Let us consider the AR(2) model,
xt = φ1xt-1 + φ2xt-2 + wt
Suppose that we have observed n data values and wish to forecast the value of xn+1 and xn+2 , this can be done using the above equation.
xn+1 = φ1xn + φ2xn-1 + wn+1
xn+2 = φ1xn+1 + φ2xn + wn+2
We replace the wn+1 and wn+2 by the expected value of 0 (the assumed mean for the errors). We use the forecasted value of xn+1 to get the values of xn+2 .
In general, the forecasting procedure is as follows –
- For any wj with 1 ≤ j ≤ n, use the sample residual for time point j
- For any wj with j > n, use 0 as the value of wj
- For any xj with 1 ≤ j ≤ n, use the observed value of xj
- For any xj with j > n use the forecasted value of xj
Seasonal ARIMA Models
Seasonality in time series is a regular pattern of changes occurring at fixed time periods. Lets denote this fixed time period with S. For example, if the sales of a particular product increases every July then S = 12 (months per year).
In a seasonal ARIMA models, seasonal AR and MA terms predict xt using data values and errors at times with lags that are multiples of S (the span of the seasonality).
- With weekly data (S = 7), a stationary seasonal AR model with order 2 would depend on xt-7 and xt-14 . The equation to represent it would be –
xt = φ1xt-7 + φ2xt-14 + wt
- Similarly a seasonal MA model with order 1 and span 12 would be –
xt = θ12wt-12 + wt
Seasonality usually causes the series to be non-stationary because the average values at some particular times within the seasonal span (months, for example) may be different than the average values at other times. For instance, the sale of blankets will always be higher in the winter months.
Seasonal differencing is defined as a difference between a value and a value with lag that is a multiple of S. With S = 24, a seasonal difference is xt – xt-24 .
Seasonal differencing can occur with non-seasonal differencing too.
The above equations assumed that the non-seasonal orders are zero. A model with non-seasonal as well as seasonal orders is represented as –
ARIMA (p, d, q) x (P, D, Q)S
p is the non-seasonal AR order, d is the non-seasonal differencing order, q is the non-seasonal MA order, P is the seasonal AR order, D is the non-seasonal differencing order, Q is the non-seasonal MA order and S is the span of the seasonality.
Therefore, an ARIMA(1, 0, 1) x (1, 0, 2)12 model would have the following equation –
xt = φ1xt-1 + φ12xt-12 + θ1wt-1 + θ12wt-12 + θ24wt-24 + wt
Identifying a seasonal model
- Examine the time series plot of the data for trend and seasonality. We usually know beforehand whether we have gathered seasonal (months, weeks, years etc.) or not.
- We need to do any necessary differencing –
- If there is seasonality and no trend, then differencing of order S is required. Seasonality in ACF will appear as a slowly tapering pattern at multiples of S.
- If there is linear trend but no seasonality then apply a first difference. If there is quadratic trend then apply a second order difference.
- If there is both trend and seasonality, first apply a seasonal difference. If the trend remains then apply the requisite non-seasonal difference. (first order, second order etc.)
- If there is no trend and no seasonality then no differencing is needed.
- Examine the ACF and PACF plot of the differenced data (if differencing is necessary).
- Non-seasonal terms – Examine the early lags to guess the non-seasonal terms. Spikes in the ACF (at low lags) indicate non-seasonal MA terms. Spike in the PACF (at low lags) indicate possible non-seasonal AR terms.
- Seasonal terms – Examine the patterns across lags that are multiples of S. For example, for weekly data, look at lags 7, 14, 21 and so on. The seasonal lags are judged in the same way as we judge the earlier lags.
- Use a statistical software like R, to estimate the coefficients of the decided model.
- Examine the coefficients following the same diagnosis steps that we do for the non-seasonal models. If the diagnosis results are not good, we need to redo step 3 above.
Thank You. Hope you found this useful. 🙂