Statistics

Introduction

This is majorl ## Standard Definitions

Expected Value E[X] is given by

E[X]=xf(x)dx VarianceVar(X) is given by

Var(X)=E(X2)E(X)2 Var(X)=(xE[X])2fX(x)dx Higher Moments E(Xn) is given by

E(Xn)=xnfX(x)dx Characteristic function(CHF) ϕX(u) for uR is given by

ϕX(u)=E[eiuX]=eiuXf(x)dx

Moment generating functionMX(u) is given by MX(u)=ϕX(iu)=E[euX]=euxf(x)dx Cumulant characteristic function ζX(u) is given by ζX(u)=logE[eiux]=logϕX(u)

Central moments$ _l$ is given by E[(Xμ)l]

Skewness S(x) and Kurtosis K(x) are the normalised 3rd and 4th central moments of a distribution respectively. The normalization factors are σ3 and σ4 respectively where σ is the standard deviation of X.

The quantity K(x)3 is called the excess kurtosis since K(x)=3 is the kurtosis for a normal distribution.

Let {x1,x2,x3....xT} be a random sample of X with T observations

Sample Meanμ^x is given by t=1TxtT Sample Varianceσ^x is given by t=1T(xtμ^x)2T1 Sample SkewnessS^x is given by t=1T(xtμ^x)3(T1)σ^x3 Sample KurtosisK^x is given by t=1T(xtμ^x)4(T1)σ^x4

Univaiate Distributions

Normal Distribution

A random variable X is said to be normally distrbuted if it has a probability density function as follows

fX(x)=1σ2πe12(xμσ)2

It is a continous probability distribution

μ and σ are the mean and variance of the distribution respectively

The case where μ=0 and σ=1 is called standard normal distribution and its PDF is given by fX(x)=12πex22

import numpy as np
import math 
import matplotlib.pyplot as plt
import scipy.stats as st
from mpl_toolkits import mplot3d


def plotNormalPDF_CDF_CHF(mu ,sigma):
    i = complex(0,1)
    chf = lambda u : np.exp(i*mu*u -(sigma**2)*u*u/2)
    pdf = lambda x : st.norm.pdf(x,mu,sigma)
    cdf = lambda x : st.norm.cdf(x,mu,sigma)
    
    x = np.linspace(5,15,100)
    u = np.linspace(0,5,250)
    print(type(pdf))
    # figure 1 ,PDF
    plt.figure(1)
    plt.plot(x,pdf(x))
    plt.grid()
    plt.xlabel('x')
    plt.ylabel('PDF')
  
    # figure 2 ,CDF
    plt.figure(2)
    plt.plot(x,cdf(x))
    plt.grid()
    plt.xlabel('x')
    plt.ylabel('CDF')
  
    #  figure 3 ,CHF
  
    plt.figure(3)
    ax = plt.axes(projection = '3d')
    chfV = chf(u)
  
    x = np.real(chfV)
    y = np.imag(chfV)
    ax.plot3D(u,x,y,'red')
    ax.view_init(30 ,-120)
    
plotNormalPDF_CDF_CHF(10,1)
<class 'function'>

Log Normal Distibution

A random Variable X is said to have log normal distibution if Y=lnX and Y is normally distributed.

The PDF of log normal distribution is given by

fX(x)=1xσ2πe((lnxμ)22σ2) where μ and σ are the mean and variance of Y(lnX) respectively.

Hence the mean μ and variance σ of X are as follows

μ=eμ+12σ2 σ=e2μ+2σ2e2μ+σ2 Important thing to note here is that x can take values in (0,) only.

Multivariate Distributions

Correlation

The correlation coefficient between two random variables X and Y is defined as ρx,y=Cov(X,Y)Var(X)Var(Y)=E[(Xμx)(Yμy)]E(Xμx)2E(Yμy)2

The sample correlation is given by ρ^x,y=t=1T(xtx¯)(yty¯)t=1T(xtx¯)t=1T(yty¯)

Two-dimensional densities.

The joint CDF of two random variables ,X and Y ,is the function FX,Y(.,.):R2[0,1],which is defined by:

FX,Y(x,y)=P[Xx,Yy] If X and Y are continous variables, then the joint PDF of X and Y is a function of fX,Y(x,y)=2FX,Y(x,y)xy Bivariate Normal density functions

X=[X,Y]T and XN([00],[1,ρρ,1])

import numpy as np
import matplotlib.pyplot as plt
#from matplotlib.mlab import bivariate_normal bivariate_normal seems to be deprecated

def bivariate_normal(X, Y, sigmax=1.0, sigmay=1.0,
                     mux=0.0, muy=0.0, sigmaxy=0.0):
    """
    Bivariate Gaussian distribution for equal shape *X*, *Y*.
    See `bivariate normal
    <http://mathworld.wolfram.com/BivariateNormalDistribution.html>`_
    at mathworld.
    """
    Xmu = X-mux
    Ymu = Y-muy

    rho = sigmaxy/(sigmax*sigmay)
    z = Xmu**2/sigmax**2 + Ymu**2/sigmay**2 - 2*rho*Xmu*Ymu/(sigmax*sigmay)
    denom = 2*np.pi*sigmax*sigmay*np.sqrt(1-rho**2)
    return np.exp(-z/(2*(1-rho**2))) / denom

def BivariateNormalPDFPlot():
  # Number of points in each direction
      n = 40;
      
      # parameters
      mu_1 = 0;
      mu_2 = 0;
      sigma_1=1;
      sigma_2=0.5;
      rho1=0.0
      rho2=-0.8
      rho3=0.8
      
      x = np.linspace(-3.0,3.0,n)
      y = np.linspace(-3.0,3.0,n)
      X,Y =np.meshgrid(x,y)
      Z = lambda rho:bivariate_normal(X,Y,sigma_1,sigma_2,mu_1,mu_2,rho*sigma_1*sigma_2)
      
      fig =plt.figure(1)
      ax = fig.add_subplot(projection= '3d')
      ax.plot_surface(X, Y, Z(rho1),cmap='viridis',linewidth=0)
      ax.set_xlabel('X axis')
      ax.set_ylabel('Y axis')
      ax.set_zlabel('Z axis')
      plt.show()
      
      fig =plt.figure(2)
      ax = fig.add_subplot(projection= '3d')
      ax.plot_surface(X, Y, Z(rho2),cmap='viridis',linewidth=0)
      ax.set_xlabel('X axis')
      ax.set_ylabel('Y axis')
      ax.set_zlabel('Z axis')
      plt.show()
      
      fig =plt.figure(3)
      ax = fig.add_subplot(projection= '3d')
      ax.plot_surface(X, Y, Z(rho3),cmap='viridis',linewidth=0)
      ax.set_xlabel('X axis')
      ax.set_ylabel('Y axis')
      ax.set_zlabel('Z axis')
      plt.show()
  
BivariateNormalPDFPlot()

Hypothesis Testing

t-statistic is the ratio of departure of the estimated value of a paramater from its hypothesized value to it’s standard error.

It is used when the sample size is small or the population standard deviation is unknown.

Let β^ be an estimator of parameter β in some statistical model. Then the t-statistic is given by tβ^=β^β0s.e(β^) where s.e(β^) is the standard error of the estimator β^ for β and β0 is a non-random , know constant , which may or maynot match actual unknow parameter value β