library(ggfortify)
library(forecast)
path_fig = "~/Dropbox/Evry/TimeSeries/Cours/Figures"
We consider the analysis of quarterly U.S. GNP from 1947(1) to 2002(3), n = 223 observations. The data are real U.S. gross national product in billions of chained 1996 dollars and have been seasonally adjusted. The data were obtained from the Federal Reserve Bank of St. Louis (http://research.stlouisfed.org/). This example is from Peter J Brockwell and Richard A Davis, Time series : theory and methods, Springer Science & Business Media, 2013.
Here are the steps to build an ARIMA model for the gnp
series of astsa
library.
library(astsa)
gnp
autoplot(gnp)
There is a clear trend, the variability is perhaps increasing with the mean. We’ll the differentiation on log(gnp)
and gnp
ggAcf(gnp)
The ACFs is, as expected, (too) slowly decreasing.
autoplot(log(gnp))
ggAcf(log(gnp))
gnp
autoplot(diff(gnp))
ggAcf(diff(gnp))
This is still not stationary: neither at order 1 (mean), nor at order 2 (variance). Let’s try higher orders.
autoplot(diff(gnp,2))
ggAcf(diff(gnp,2))
log(gnp)
autoplot(diff(log(gnp)))
ggAcf(diff(log(gnp)))
This looks reasonability stationnary at orders 1 and 2. We now construct an ARMA model for diff(log(gnp))
diff(log(gnp))
ggAcf(diff(log(gnp)))
ggPacf(diff(log(gnp)))
The ACF calls for an MA(2), while PACF calls for an AR(1). We’ll then estimate in ARIMA(1,1,0) and ARIMA(0,1,2) models for log(gnp)
.
Caution: the default for the option include.drift
in the Arima
function of the forecast
package is FALSE, but in our case the diff(log(gnp))
is clearly not centered
autoplot(diff(log(gnp)))
arima_110 = Arima(log(gnp),order = c(1,1,0),include.constant=TRUE)
arima_012 = Arima(log(gnp),order = c(0,1,2),include.constant=TRUE)
criteria = matrix(c(arima_110$aicc,arima_012$aicc,arima_110$bic,arima_012$bic),nrow=2,byrow = TRUE)
colnames(criteria) = c("arima_110","arima_012")
rownames(criteria) = c("aicc","bic")
criteria
arima_110 arima_012
aicc -1431.110 -1431.745
bic -1421.012 -1418.318
check = checkresiduals(arima_110 )
Ljung-Box test
data: residuals
Q* = 9.8183, df = 6, p-value = 0.1325
Model df: 2. Total lags used: 8
auto.arima(log(gnp))
set.seed(666)
omega =rnorm(n=1000)
X = ts(omega[13:1000] - 0.5 * omega[1:988], start = c(1900,1), frequency = 12)
autoplot(X)
ggAcf(X)
dev.print(pdf,paste(c(path_fig,"acf_pure_season_MA1.pdf"),collapse="/"))
quartz_off_screen
2
ggPacf(X)
dev.print(pdf,paste(c(path_fig,"pacf_pure_season_MA1.pdf"),collapse="/"))
quartz_off_screen
2
phi = c(rep(0,11),.8)
ACF = ARMAacf(ar=phi, ma=-.5, 50)[-1] # [-1] removes 0 lag
PACF = ARMAacf(ar=phi, ma=-.5, 50, pacf=TRUE)
par(mfrow=c(1,2), mar=c(2.5,2.5,2,1), mgp=c(1.6,.6,0))
plot(ACF, type="h", xlab="LAG", ylim=c(-.4,.8), axes=FALSE)
abline(h=0)
axis(1, seq(0,50,by=12))
axis(2)
box()
plot(PACF, type="h", xlab="LAG", ylim=c(-.4,.8), axes=FALSE)
abline(h=0)
axis(1, seq(0,50,by=12))
axis(2)
box()
dev.print(pdf,paste(c(path_fig,"sarmaacf.pdf"),collapse="/"))
quartz_off_screen
2
http://www.personal.psu.edu/asb17/old/sta4853/files/sta4853-17print.pdf
autoplot(prodn)
dev.print(pdf,paste(c(path_fig,"prodn.pdf"),collapse="/"))
quartz_off_screen
2
autoplot(diff(prodn))
dev.print(pdf,paste(c(path_fig,"diffprodn.pdf"),collapse="/"))
quartz_off_screen
2
monthplot(diff(prodn))
dev.print(pdf,paste(c(path_fig,"months_diffprodn.pdf"),collapse="/"))
quartz_off_screen
2
ggAcf(diff(prodn),lag.max = 40)
dev.print(pdf,paste(c(path_fig,"acf_diffprodn.pdf"),collapse="/"))
quartz_off_screen
2
ggPacf(diff(prodn),lag.max = 40)
dev.print(pdf,paste(c(path_fig,"pacf_diffprodn.pdf"),collapse="/"))
quartz_off_screen
2