Statistics

Introduction

This is majorl ## Standard Definitions

Expected Value \(\mathbb{E}[X]\) is given by

\[\mathbb{E}[X] = \int_{-\infty}^{\infty}x f(x) dx\] Variance\(Var(X)\) is given by

\[Var(X) = \mathbb{E}(X^2) - \mathbb{E}{{(X)}^2}\] \[Var(X) = \int_{-\infty}^{\infty}(x-\mathbb{E}[X])^2 f_X(x) dx\] Higher Moments \(\mathbb{E}(X^n)\) is given by

\[\mathbb{E}(X^n) = \int_{-\infty}^{\infty}x^n f_X(x) dx \] Characteristic function(CHF) \(\phi_X(u)\) for \(u \in \mathbb{R}\) is given by

\[\phi_X(u) = \mathbb{E}[e^{iuX}] = \int_{-\infty}^{\infty}e^{iuX}f(x)dx \]

Moment generating function\(\mathcal{M}_X(u)\) is given by \[\mathcal{M}_X(u) = \phi_X(-iu)= \mathbb{E}[e^{uX}] = \int_{-\infty}^{\infty}e^{ux}f(x)dx \] Cumulant characteristic function \(\zeta_X(u)\) is given by \[\zeta_X(u) = log\mathbb{E}[e^{iux}] = log\phi_X(u)\]

Central moments$ _l$ is given by \(\mathbb{E}[(X-\mu)^l]\)

Skewness \(S(x)\) and Kurtosis \(K(x)\) are the normalised \(3^{rd}\) and \(4^{th}\) central moments of a distribution respectively. The normalization factors are \(\sigma^3\) and \(\sigma^4\) respectively where \(\sigma\) is the standard deviation of X.

The quantity \(K(x) - 3\) is called the excess kurtosis since \(K(x) = 3\) is the kurtosis for a normal distribution.

Let \(\{x_1,x_2,x_3 ....x_T\}\) be a random sample of X with T observations

Sample Mean\(\hat\mu_x\) is given by \[\frac{\sum_{t=1}^Tx_t}{T}\] Sample Variance\(\hat\sigma_x\) is given by \[\frac{\sum_{t=1}^T(x_t - \hat\mu_x)^2}{T-1}\] Sample Skewness\(\hat S_x\) is given by \[\frac{\sum_{t=1}^T(x_t - \hat\mu_x)^3}{(T-1)\hat\sigma_x^3}\] Sample Kurtosis\(\hat K_x\) is given by \[\frac{\sum_{t=1}^T(x_t - \hat\mu_x)^4}{(T-1)\hat\sigma_x^4}\]

Univaiate Distributions

Normal Distribution

A random variable \(X\) is said to be normally distrbuted if it has a probability density function as follows

\[f_X(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-{\frac{1}{2}}(\frac{x-\mu}{\sigma})^2}\]

It is a continous probability distribution

\(\mu\) and \(\sigma\) are the mean and variance of the distribution respectively

The case where \(\mu =0\) and \(\sigma = 1\) is called standard normal distribution and its PDF is given by \[ f_X(x) =\frac{1}{\sqrt{2\pi}}e^{\frac{-x^2}{2}}\]

import numpy as np
import math 
import matplotlib.pyplot as plt
import scipy.stats as st
from mpl_toolkits import mplot3d


def plotNormalPDF_CDF_CHF(mu ,sigma):
    i = complex(0,1)
    chf = lambda u : np.exp(i*mu*u -(sigma**2)*u*u/2)
    pdf = lambda x : st.norm.pdf(x,mu,sigma)
    cdf = lambda x : st.norm.cdf(x,mu,sigma)
    
    x = np.linspace(5,15,100)
    u = np.linspace(0,5,250)
    print(type(pdf))
    # figure 1 ,PDF
    plt.figure(1)
    plt.plot(x,pdf(x))
    plt.grid()
    plt.xlabel('x')
    plt.ylabel('PDF')
  
    # figure 2 ,CDF
    plt.figure(2)
    plt.plot(x,cdf(x))
    plt.grid()
    plt.xlabel('x')
    plt.ylabel('CDF')
  
    #  figure 3 ,CHF
  
    plt.figure(3)
    ax = plt.axes(projection = '3d')
    chfV = chf(u)
  
    x = np.real(chfV)
    y = np.imag(chfV)
    ax.plot3D(u,x,y,'red')
    ax.view_init(30 ,-120)
    
plotNormalPDF_CDF_CHF(10,1)
<class 'function'>

Log Normal Distibution

A random Variable \(X\) is said to have log normal distibution if \(Y = \ln{X}\) and \(Y\) is normally distributed.

The PDF of log normal distribution is given by

\[f_X(x) = \frac{1}{x\sigma\sqrt{2\pi}}e^{(-\frac{(\ln{x} -\mu)^2}{2{\sigma}^2})}\] where \(\mu\) and \(\sigma\) are the mean and variance of \(Y(\ln X)\) respectively.

Hence the mean \(\mu^*\) and variance \(\sigma^*\) of X are as follows

\[\mu^* = e^{\mu + \frac{1}{2}\sigma^2}\] \[\sigma^* = e^{2\mu + 2\sigma^2} - e^{2\mu +\sigma^2}\] Important thing to note here is that \(x\) can take values in \((0,\infty)\) only.

Multivariate Distributions

Correlation

The correlation coefficient between two random variables \(X\) and \(Y\) is defined as \[ \rho_{x,y} = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}} = \frac{E[(X-\mu_x)(Y-\mu_y)]}{\sqrt{E(X-\mu_x)^2E(Y-\mu_y)^2}}\]

The sample correlation is given by \[ \hat\rho_{x,y} = \frac{\sum_{t=1}^{T}(x_t - \bar{x})(y_t - \bar{y})}{\sqrt{\sum_{t=1}^T(x_t - \bar{x})\sum_{t=1}^T(y_t - \bar{y})}}\]

Two-dimensional densities.

The joint CDF of two random variables ,\(X\) and \(Y\) ,is the function \(F_{X,Y}(.,.):\mathbb{R}^2 \rightarrow [0,1]\),which is defined by:

\[ F_{X,Y}(x,y) = \mathbb{P}[X\leq{x},Y\leq{y}]\] If \(X\) and \(Y\) are continous variables, then the joint PDF of X and Y is a function of \[f_{X,Y}(x,y) = \frac{\partial^2{F_{X,Y}(x,y)}}{\partial{x}\partial{y}} \] Bivariate Normal density functions

\(X = [X,Y]^T\) and \[X \sim \mathcal{N}(\begin{bmatrix}0\\0\end{bmatrix},\begin{bmatrix}1 , \rho \\ \rho ,1\end{bmatrix}) \]

import numpy as np
import matplotlib.pyplot as plt
#from matplotlib.mlab import bivariate_normal bivariate_normal seems to be deprecated

def bivariate_normal(X, Y, sigmax=1.0, sigmay=1.0,
                     mux=0.0, muy=0.0, sigmaxy=0.0):
    """
    Bivariate Gaussian distribution for equal shape *X*, *Y*.
    See `bivariate normal
    <http://mathworld.wolfram.com/BivariateNormalDistribution.html>`_
    at mathworld.
    """
    Xmu = X-mux
    Ymu = Y-muy

    rho = sigmaxy/(sigmax*sigmay)
    z = Xmu**2/sigmax**2 + Ymu**2/sigmay**2 - 2*rho*Xmu*Ymu/(sigmax*sigmay)
    denom = 2*np.pi*sigmax*sigmay*np.sqrt(1-rho**2)
    return np.exp(-z/(2*(1-rho**2))) / denom

def BivariateNormalPDFPlot():
  # Number of points in each direction
      n = 40;
      
      # parameters
      mu_1 = 0;
      mu_2 = 0;
      sigma_1=1;
      sigma_2=0.5;
      rho1=0.0
      rho2=-0.8
      rho3=0.8
      
      x = np.linspace(-3.0,3.0,n)
      y = np.linspace(-3.0,3.0,n)
      X,Y =np.meshgrid(x,y)
      Z = lambda rho:bivariate_normal(X,Y,sigma_1,sigma_2,mu_1,mu_2,rho*sigma_1*sigma_2)
      
      fig =plt.figure(1)
      ax = fig.add_subplot(projection= '3d')
      ax.plot_surface(X, Y, Z(rho1),cmap='viridis',linewidth=0)
      ax.set_xlabel('X axis')
      ax.set_ylabel('Y axis')
      ax.set_zlabel('Z axis')
      plt.show()
      
      fig =plt.figure(2)
      ax = fig.add_subplot(projection= '3d')
      ax.plot_surface(X, Y, Z(rho2),cmap='viridis',linewidth=0)
      ax.set_xlabel('X axis')
      ax.set_ylabel('Y axis')
      ax.set_zlabel('Z axis')
      plt.show()
      
      fig =plt.figure(3)
      ax = fig.add_subplot(projection= '3d')
      ax.plot_surface(X, Y, Z(rho3),cmap='viridis',linewidth=0)
      ax.set_xlabel('X axis')
      ax.set_ylabel('Y axis')
      ax.set_zlabel('Z axis')
      plt.show()
  
BivariateNormalPDFPlot()

Hypothesis Testing

t-statistic is the ratio of departure of the estimated value of a paramater from its hypothesized value to it’s standard error.

It is used when the sample size is small or the population standard deviation is unknown.

Let \(\hat\beta\) be an estimator of parameter \(\beta\) in some statistical model. Then the t-statistic is given by \[ t_{\hat\beta} = \frac{\hat\beta - \beta_0}{s.e(\hat\beta)}\] where \(s.e(\hat\beta)\) is the standard error of the estimator \(\hat\beta\) for \(\beta\) and \(\beta_0\) is a non-random , know constant , which may or maynot match actual unknow parameter value \(\beta\)