What is ρ in the Bivariate Normal distribution?

From ShotStat
Revision as of 18:51, 9 June 2015 by Herb (talk | contribs) (ok, this should be close to done...)
Jump to: navigation, search

In going from the the Normal distributions for the horizontal axis, \(\mathcal{N}(\mu_h,\,\sigma_h^2)\), and vertical axis, \(\mathcal{N}(\mu_v,\,\sigma_v^2)\) a new equation was postulated with a parameter \(\rho\).

    \( f(h,v; \mu_h, \mu_v, \sigma_h, \sigma_v, \rho) = \frac{1}{2 \pi \sigma_h \sigma_v \sqrt{1-\rho^2}} \exp\left( -\frac{1}{2(1-\rho^2)}\left[ \frac{(h-\mu_h)^2}{\sigma_h^2} + \frac{(v-\mu_v)^2}{\sigma_v^2} - \frac{2\rho(h-\mu_h)(v-\mu_v)}{\sigma_h \sigma_v} \right] \right) \)

First a bit of explanation about what \(\rho\) is. Assuming that two variables are correlated, a simple correlation to propose is that the two variables are linearly correlated. Thus for some point \(i\) the equation of interest is:

   \(v_i = v_0 + \beta h_i + \epsilon_i\)

Where \(v_0\) is the intercept along the vertical axis, \(\beta\) is the slope of the line, and \(\epsilon_i\) is the error in the \(i\)th measurement. Rearranging the equation for \(\epsilon_i\):<bra />

   \(\epsilon_i = v_i - v_0 - \beta h_i \)

Given the locations \((h_i, v_i)\) of the shots in the group on the target, the coefficients \(v_0\) and \(\beta\) are calculated to give a "best" fit to the data. The "best" fit is deemed to be when the sum of the squares of the errors (SSE) is minimized.

\(SSE = \sum_{i=1}^n \epsilon_i = \sum_{i=1}^n \lbrace v_i - v_0 - \beta h_i \rbrace\)

There are two examples of best fits lines shown below. The graph on the right shows the "residuals" from the fit line as vertical line segments from the horizontal value which is assumed to be absolutely accurate to the vertical value which is assumed to contain the error. Values above the line are positive and values below the line are negative.

800px-Linear regression.pngLinear least squares example.png

With the "best line" fit, then the correlation coefficient \(\rho\) is given by the equation:

    \(\rho = \rho_{hv} =\frac{\sum ^n _{i=1}(h_i - \bar{h})(v_i - \bar{v})}{\sqrt{\sum ^n _{i=1}(h_i - \bar{h})^2} \sqrt{\sum ^n _{i=1}(v_i - \bar{v})^2}}\)  where \(-1 \leq \rho \leq 1\)

Thus if you imagine an elliptical shot pattern, with a lot of shots to reduce noise, being rotated about its COI. Then \(\rho = 0\) when the major axis of the ellipse is along the horizontal or vertical axis. The maximum value of \(\rho\) would when the major axis of the ellipse was along the one of the two (negative and positive slope) 45 degree lines bisecting the horizontal and vertical axes.


Bullseye.jpg For the population of shots, if there is a linear relationship between the horizontal and vertical positions of a shot, then the point \((\mu_h, \mu_v)\) would be on the line. Thus around \((\mu_h, \mu_v)\) \(\beta\) would not only be the slope of the line, but it would also be a proportionality constant.

\(\beta = \frac{(v-\mu_v)}{(h-\mu_h)}\)