Difference between revisions of "What is ρ in the Bivariate Normal distribution?"

From ShotStat
Jump to: navigation, search
(temp save)
(ok, this should be close to done...)
Line 25: Line 25:
 
<math>SSE = \sum_{i=1}^n \epsilon_i = \sum_{i=1}^n \lbrace v_i - v_0 - \beta h_i \rbrace</math>
 
<math>SSE = \sum_{i=1}^n \epsilon_i = \sum_{i=1}^n \lbrace v_i - v_0 - \beta h_i \rbrace</math>
  
There are two examples of best fits lines shown below. The graph on the right shows the "residuals" from the fit as vertical lines from the horizontal value which is assumed to be accurate to the vertcal value which is assumed to contain the error.  
+
There are two examples of best fits lines shown below. The graph on the right shows the "residuals" from the fit line as vertical line segments from the horizontal value which is assumed to be absolutely accurate to the vertical value which is assumed to contain the error. Values above the line are positive and values below the line are negative. 
  
 
[[File:800px-Linear_regression.png|200px]][[File:Linear least squares example.png|200px]]
 
[[File:800px-Linear_regression.png|200px]][[File:Linear least squares example.png|200px]]
  
&nbsp;&nbsp;&nbsp;&nbsp;<math>\rho = \rho_{hv} =\frac{\sum ^n _{i=1}(h_i - \bar{h})(v_i - \bar{v})}{\sqrt{\sum ^n _{i=1}(h_i - \bar{h})^2} \sqrt{\sum ^n _{i=1}(v_i - \bar{v})^2}}</math>
+
With the "best line" fit, then the correlation coefficient <math>\rho</math> is given by the equation:<br />
 +
 
 +
&nbsp;&nbsp;&nbsp;&nbsp;<math>\rho = \rho_{hv} =\frac{\sum ^n _{i=1}(h_i - \bar{h})(v_i - \bar{v})}{\sqrt{\sum ^n _{i=1}(h_i - \bar{h})^2} \sqrt{\sum ^n _{i=1}(v_i - \bar{v})^2}}</math>&nbsp;&nbsp;where <math>-1 \leq \rho \leq 1</math>
  
 
Thus if you imagine an elliptical shot pattern, with a lot of shots to reduce noise, being rotated about its COI. Then <math>\rho = 0</math> when the major axis of the ellipse is along the horizontal or vertical axis. The maximum value of <math>\rho</math> would when the major axis of the ellipse was along the one of the two (negative and positive slope) 45 degree lines bisecting the horizontal and vertical axes.   
 
Thus if you imagine an elliptical shot pattern, with a lot of shots to reduce noise, being rotated about its COI. Then <math>\rho = 0</math> when the major axis of the ellipse is along the horizontal or vertical axis. The maximum value of <math>\rho</math> would when the major axis of the ellipse was along the one of the two (negative and positive slope) 45 degree lines bisecting the horizontal and vertical axes.   

Revision as of 18:51, 9 June 2015

In going from the the Normal distributions for the horizontal axis, \(\mathcal{N}(\mu_h,\,\sigma_h^2)\), and vertical axis, \(\mathcal{N}(\mu_v,\,\sigma_v^2)\) a new equation was postulated with a parameter \(\rho\).

    \( f(h,v; \mu_h, \mu_v, \sigma_h, \sigma_v, \rho) = \frac{1}{2 \pi \sigma_h \sigma_v \sqrt{1-\rho^2}} \exp\left( -\frac{1}{2(1-\rho^2)}\left[ \frac{(h-\mu_h)^2}{\sigma_h^2} + \frac{(v-\mu_v)^2}{\sigma_v^2} - \frac{2\rho(h-\mu_h)(v-\mu_v)}{\sigma_h \sigma_v} \right] \right) \)

First a bit of explanation about what \(\rho\) is. Assuming that two variables are correlated, a simple correlation to propose is that the two variables are linearly correlated. Thus for some point \(i\) the equation of interest is:

   \(v_i = v_0 + \beta h_i + \epsilon_i\)

Where \(v_0\) is the intercept along the vertical axis, \(\beta\) is the slope of the line, and \(\epsilon_i\) is the error in the \(i\)th measurement. Rearranging the equation for \(\epsilon_i\):<bra />

   \(\epsilon_i = v_i - v_0 - \beta h_i \)

Given the locations \((h_i, v_i)\) of the shots in the group on the target, the coefficients \(v_0\) and \(\beta\) are calculated to give a "best" fit to the data. The "best" fit is deemed to be when the sum of the squares of the errors (SSE) is minimized.

\(SSE = \sum_{i=1}^n \epsilon_i = \sum_{i=1}^n \lbrace v_i - v_0 - \beta h_i \rbrace\)

There are two examples of best fits lines shown below. The graph on the right shows the "residuals" from the fit line as vertical line segments from the horizontal value which is assumed to be absolutely accurate to the vertical value which is assumed to contain the error. Values above the line are positive and values below the line are negative.

800px-Linear regression.pngLinear least squares example.png

With the "best line" fit, then the correlation coefficient \(\rho\) is given by the equation:

    \(\rho = \rho_{hv} =\frac{\sum ^n _{i=1}(h_i - \bar{h})(v_i - \bar{v})}{\sqrt{\sum ^n _{i=1}(h_i - \bar{h})^2} \sqrt{\sum ^n _{i=1}(v_i - \bar{v})^2}}\)  where \(-1 \leq \rho \leq 1\)

Thus if you imagine an elliptical shot pattern, with a lot of shots to reduce noise, being rotated about its COI. Then \(\rho = 0\) when the major axis of the ellipse is along the horizontal or vertical axis. The maximum value of \(\rho\) would when the major axis of the ellipse was along the one of the two (negative and positive slope) 45 degree lines bisecting the horizontal and vertical axes.


Bullseye.jpg For the population of shots, if there is a linear relationship between the horizontal and vertical positions of a shot, then the point \((\mu_h, \mu_v)\) would be on the line. Thus around \((\mu_h, \mu_v)\) \(\beta\) would not only be the slope of the line, but it would also be a proportionality constant.

\(\beta = \frac{(v-\mu_v)}{(h-\mu_h)}\)