Time Series

time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Time series are very frequently plotted via line charts. Time series are used in statisticssignal processingpattern recognitioneconometricsmathematical financeweather forecasting, intelligent transport and trajectory forecasting earthquake predictionelectroencephalographycontrol engineeringastronomycommunications engineering, and largely in any domain of applied scienceand engineering which involves temporal measurements. Time series data have a natural temporal ordering. This makes time series analysis distinct from cross-sectional studies, in which there is no natural ordering of the observations (e.g. explaining people’s wages by reference to their respective education levels, where the individuals’ data could be entered in any order).

Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A stochastic model for a time series will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values.

The clearest way to examine a regular time series manually is with a line chart such as the one shown for tuberculosis in the United States, made with a spreadsheet program. The number of cases was standardized to a rate per 100,000 and the percent change per year in this rate was calculated. The nearly steadily dropping line shows that the TB incidence was decreasing in most years, but the percent change in this rate varied by as much as +/- 10%, with ‘surges’ in 1975 and around the early 1990s. The use of both vertical axes allows the comparison of two time series in one graphic.

Smoothing Time Series

There are various fairly simple smoothing/averaging methods. Two are “ordinary moving averages” and “exponentially weighted moving averages.”

Ordinary Moving Averages For a “span” of k periods

y e t = moving average through time t

= yt + yt−1 + yt−2 + ··· + yt−k−1 k

Where seasonal effects are expected, it is standard to use k = number of periods per cycle

Exponentially Weighted Moving Averages These weight observations less heavily as one moves back in time from the current period. They are typically computed “recursively” as

y e t = exponentially weighted moving average at time t

= wyt + (1 − w) yet−1

(yet−1 is the EWMA from the previous period and the current EWMA is a compromise between the previous EWMA and the current observation.) One must start this recursion somewhere and it’s common to take ye1 = y1. Notice that w = 1 does no smoothing, while w = 0 smooths so much that the EWMA never changes (i.e. all the values are equal to the first).

Exercise/Example Table 13.1 (page 13-5) of the text gives quarterly retail sales for JC Penney, 1996-2001 (in millions of dollars). “By hand” 1) using k = 4 find ordinary moving averages for periods 5 through 8, then 2) using (e.g.) w = .3, find the exponentially weighted moving average values for those periods.

A plot of both the original time series and the k = 4 MA values for the JC Penney data is in Figure 13.13, page 13-28 of the text. Here is a JMP “Overlay Plot” version of this picture and an indication of how you can get JMP to make the MA’s.

WHAT IS THE TREND?

The ABS trend is defined as the ‘long term’ movement in a time series without calendar related and irregular effects, and is a reflection of the underlying level. It is the result of influences such as population growth, price inflation and general economic changes. The following graph depicts a series in which there is an obvious upward trend over time.

ARIMA Models using R:

For forecasting stationary time series data we need to choose an optimal ARIMA model (p,d,q). For this we can use auto.arima() function which can choose optimal (p,d,q) value and return us. Know more about ARIMA from here.

auto.arima(ts[,2])

Series: ts[, 2]

ARIMA(3,1,1) with drift

Coefficients:

ar1      ar2      ar3      ma1   drift

-0.2621  -0.1223  -0.2324  -0.7825  0.2806

s.e.   0.2264   0.2234   0.1798   0.2333  0.1316

sigma^2 estimated as 41.64:  log likelihood=-190.85

AIC=393.7   AICc=395.31   BIC=406.16

Forecast time series

Now we use forecast() method to forecast the future events.

forecast(auto.arima(dif_data))   Point Forecast     Lo 80      Hi 80     Lo 95    Hi 9561   -3.076531531 -5.889584 -0.2634795 -7.378723 1.22566062    0.231773625 -2.924279  3.3878266 -4.594993 5.05854063    0.702386360 -2.453745  3.8585175 -4.124500 5.52927264   -0.419069906 -3.599551  2.7614107 -5.283195 4.44505565    0.025888991 -3.160496  3.2122736 -4.847266 4.89904466    0.098565814 -3.087825  3.2849562 -4.774598 4.97172967   -0.057038778 -3.243900  3.1298229 -4.930923 4.81684668    0.002733053 -3.184237  3.1897028 -4.871317 4.87678369    0.013817766 -3.173152  3.2007878 -4.860232 4.88786870   -0.007757195 -3.194736  3.1792219 -4.881821 4.866307

plot(forecast(auto.arima(dif_data))) 