Time Series Analysis

Introduction

Resources

  • Time series analysis by James D Hamilton - its considered to be the bible on time series analysis, pretty much covers all the theory , it doesnt have any code or pseudocode

  • Introduction to Time Series with R by Metcalfe & Coweperwait

  • Analysis of Financial Time Series by Tsay

Aim and Motivation

  • Would be exploring time series analysis in the context of finance
  • Implementing code from the book by Tsay , would be using the book by Hamilton and Metcalfe as reference
  • Aim to understand different time series models in R.
  • (Will keep on expanding this section on the go)

Financial Time Series and Linear Time series

Asset Returns

Definition 1 (Simple gross return)
Arithmetic Returns

Pt=Pt1(1+Rt)    Rt=PtPt1PtP_t = P_{t-1}(1+R_t) \implies R_t = \frac{P_t - P_{t-1}}{P_t} where PtP_t and RtR_t are the price and return at time tt respectively
Continuously compounded return or log return

rt=ln(1+Rt)r_t = \ln(1+R_t)
Dividend Payment

rt=ln(Pt+Dt)ln(Pt1)r_t = \ln(P_t + D_t) -\ln(P_{t-1}) Where DtD_t is dividend payment between time t-1 and time t .


Definition 2 (Multiperiod gross return)
Similiary we can define multiperiod simple return by Rt[k]=exp[1kj=0k1(1+Rtj)]1 R_t[k] = exp{[\frac{1}{k}\sum_{j=0}^{k-1}(1+R_{t-j})]} -1 where exp is the exponentential function and k is total number periods the asset was held for.

Log return would be rt[k]=rt+rt1+.....+rtk+1 r_t[k] = r_t +r_{t-1}+.....+r_{t-k+1}

Definition 3 (Portfolio Return) Rp,t=i=1NwiRitR_{p,t}= \sum_{i=1}^{N}w_i R_{it}

Where pp is the portfolio , ii the ith asset and wiw_i the weight of the ithi^{th} asset in the portfolio

* Load data and header =T give the 1st row of the data file , that is the names of the cloumns of the data set
# da contains the return for 5 five stocks and indexes namely IBM ,value-weighted , equal-weighted and S&P composite index from 1970 to 2008
library(fBasics)
da = read.table("TST/d-ibm3dx7008.txt",header = T) 
dim(da) # dimensions for the data
[1] 9845    5
da[1:5,]# first five rows
ABCDEFGHIJ0123456789
 
 
Date
<int>
rtn
<dbl>
vwretd
<dbl>
ewretd
<dbl>
sprtrn
<dbl>
1197001020.0006860.0121370.0334500.010211
2197001050.0095960.0063750.0189470.004946
3197001060.000679-0.007233-0.005776-0.006848
4197001070.000678-0.0012720.003559-0.002047
5197001080.0020340.0005640.0028900.000540
tail(da)#Last 5 rows
ABCDEFGHIJ0123456789
 
 
Date
<int>
rtn
<dbl>
vwretd
<dbl>
ewretd
<dbl>
sprtrn
<dbl>
984020081223-0.016953-0.007618-0.008332-0.009717
984120081224-0.0009930.0044630.0052540.005781
9842200812260.0100600.0071700.0116290.005356
984320081229-0.000984-0.004481-0.016514-0.003873
9844200812300.0283080.0249510.0216920.024407
9845200812310.0073010.0175130.0367310.014158
ibm = da[,2] # IBM simple returns

sibm = ibm*100 #Percentage simple returns 

b = basicStats(sibm)

s1 = skewness(sibm)
t = s1/sqrt(6/9845) #definition for test statistic here 9845 is N 
pv  = 2*(1-pnorm(t)) # Calculate p-value

cat("skewness =", s1,"\nTest Satistic = ",t,"\np-value = ", pv)
skewness = 0.06139878 
Test Satistic =  2.487093 
p-value =  0.01287919
library(ggplot2)
ibm = as.data.frame(da[,1:2])

ibm["log_return"] = log(1+ibm["rtn"])
colnames(ibm)[2] <- "simple_return"

ggplot(ibm,aes(x = simple_return,colour="Emperical"))+
      ggtitle("Simple return distribution") + 
      xlim (-0.15,0.15)+
      geom_density()+
      stat_function(mapping = aes(colour ="Normal"),
                   fun = dnorm,args = with(ibm,c(mean =mean(simple_return),
                   sd = sd(simple_return))))
ggplot(ibm,aes(x = log_return,colour = "Emperical"))+
      ggtitle("Log return distribution")+ 
      xlim (-0.15,0.15)+geom_density()+
      stat_function(mapping = aes(colour = "Normal"),
             fun = dnorm,args = with(ibm,c(mean = mean(log_return),
             sd = sd(log_return))))+ 
scale_x_continuous("log return distribution",limits = c(-0.15,0.15))

#IBM = ibm[, 2] # IBM simple return values
#IBM = ts(IBM,frequency = 250 ,start = c(1970,1))

#plot.ts(IBM)


#acf(IBM,lag=250,ylim= c(-0.05,0.05))#ACF is covered in detail down below , the x axis label is fraction of unit time 

#libm = ibm[,2]*100
#length(libm)
#mean(libm)
#t.test(libm)
#k1 = kurtosis(libm)
#s1 =skewness(libm)
#t3  = (s1^2)/(6/9845)
#t2 = ((k1)^2)/(24/9845)
#3t2
#a = t2+t3
#a
#normalTest(libm,method = 'jb')

Linear Time Series

Stationarity

Foundation of time series analysis is Stationarity. A time series {rt}\{r_t\} is said to be strictly stationary if the joint distribution of (rt1,rt2,rt3,....,rtk)(r_{t_{1}},r_{t_{2}},r_{t_{3}},....,r_{t_{k}}) is identical to that of (rt1+t,rt2+t,rt3+t,....,rtk+t)(r_{t_{1+t}},r_{t_{2+t}},r_{t_{3+t}},....,r_{t_{k+t}}) for all tt , where kk is an arbitrary positive integer and (t1,t2,t3,....,tk)(t_1,t_2,t_3,....,t_k) is collection of kk positive integers. Strict stationarity is very hard to verify emperically
A time series is said to be weakly stationary if the mean of rtr_t and the covariance between rtr_t and rtlr_{t-l} are time invariant where ll is an arbitrary integer. Weak stationarity enables us to make inference concerning future observations

  • E(tt)=μ\mathbb{E}(t_t) = \mu
  • Cov(rt,rtl)=γlCov(r_t,r_{t-l}) = \gamma_l

Weak stationarity is commonly studied and more practical.

The covariance γl=Cov(rt,rtl)\gamma_l = Cov(r_t,r_{t-l}) is called the lag-ll autocovariance of rtr_t. From the above defintion it follows that

  • γ0=Var(rt)\gamma_0=Var(r_t)
  • γl=γl\gamma_{-l} = \gamma_l

Autocorrelation Function

Consider a weakly stationary return series rtr_t. When the linear dependence between rtr_t and its past values rtlr_{t-l} is of interest , the concept of correlation is generalised to autocorrelation. The correlation coefficient between rtr_t and rtlr_{t-l} is called the lag-l autocorrelation of rtr_t and is commonly denoted by ρl\rho_l , which under the weak stationarity assumption is a function of l only. Its defined by ρl=Cov(rt,rtl)Var(rt)Var(rtl)=Cov(rt,rtl)Var(rt)=γlγ0 \rho_l = \frac{Cov(r_t,r_{t-l})}{\sqrt{Var(r_t)Var(r_{t-l})}} = \frac{Cov(r_t,r_{t-l})}{Var(r_t)} = \frac{\gamma_l}{\gamma_0}

Its easy to see that

  • ρ0=1\rho_0 = 1
  • ρl=ρl\rho_l = \rho_{-l}
  • 1ρl1-1\leq \rho_l \leq 1

For a given sample of returns {rt}t=1T\{r_t\}_{t=1}^T , let rˉ\bar r be the sample mean. Then the lag-1 autocorrelation of rtr_t is ρ^1=t=2T(rtrˉ)(rt1rˉ)t=2T(rtrˉ)2\hat\rho_1 = \frac{\sum_{t=2}^T(r_t - \bar r)(r_{t-1} - \bar r)}{\sum_{t=2}^T(r_t - \bar r )^2}

The conditions under which ρ^\hat\rho is a consistent estimator of ρ1\rho_1 are as follows :

  • The returns {rt}\{r_t\} is an independent and identically distributed sequence
  • E(rt2)<\mathbb{E}(r_t^2) < \infty

Then ρ^1\hat\rho_1 is asymptotically normal with mean zero and variance 1/T

Testing

Let H0:ρ1=0H_0 : \rho_1 = 0 be the null hypothesis versus Ha:ρ10H_a : \rho_1 \neq 0 the alternative hypothesis. The test statistic is the usual t ratio (check the statistics section for further description) , which is Tρ^1\sqrt{T}\hat\rho_1 and follows asymptotically the standard normal distribution. The null hypothesis H0H_0 is rejected if the t ratio is large in magnitude or equivalently , the p value of the t ratio is small , lets say less than 0.05

set.seed(666)
phi = c(rep(0,11),.9)
sAR = arima.sim(list(order=c(12,0,0), ar=phi), n=37)
sAR = ts(sAR, freq=12)
layout(matrix(c(1,1,2, 1,1,3), nc=2))
par(mar=c(3,3,2,1), mgp=c(1.6,.6,0))
plot(sAR, axes=FALSE, main='seasonal AR(1)', xlab="year", type='c')
Months = c("J","F","M","A","M","J","J","A","S","O","N","D") 
points(sAR, pch=Months, cex=1.25, font=4, col=1:4)
axis(1, 1:4); abline(v=1:4, lty=2, col=gray(.7))
axis(2); box()
ACF = ARMAacf(ar=phi, ma=0, 100)
PACF = ARMAacf(ar=phi, ma=0, 100, pacf=TRUE)
plot(ACF,type="h", xlab="LAG", ylim=c(-.1,1)); abline(h=0) 
plot(PACF, type="h", xlab="LAG", ylim=c(-.1,1)); abline(h=0)