Difference between revisions of "What is ρ in the Bivariate Normal distribution?"

From ShotStat
Jump to: navigation, search
m (Square residuals)
(changed beta's to b's)
Line 15: Line 15:
 
First a bit of explanation about what <math>\rho</math> is. Assuming that two variables are correlated, a simple correlation to propose is that the two variables are linearly correlated. Thus for some point <math>i</math> the equation of interest is:<br />
 
First a bit of explanation about what <math>\rho</math> is. Assuming that two variables are correlated, a simple correlation to propose is that the two variables are linearly correlated. Thus for some point <math>i</math> the equation of interest is:<br />
  
&nbsp;&nbsp;&nbsp;<math>v_i = v_0 + \beta h_i + \epsilon_i</math>
+
&nbsp;&nbsp;&nbsp;<math>v_i = v_0 + b h_i + \epsilon_i</math>
  
Where <math>v_0</math> is the intercept along the vertical axis, <math>\beta</math> is the slope of the line, and <math>\epsilon_i</math> is the error in the <math>i</math><sup>th</sup> measurement. Rearranging the equation for <math>\epsilon_i</math>:<br />
+
Where <math>v_0</math> is the intercept along the vertical axis, <math>b</math> is the slope of the line, and <math>\epsilon_i</math> is the error in the <math>i</math><sup>th</sup> measurement. Rearranging the equation for <math>\epsilon_i</math>:<br />
  
&nbsp;&nbsp;&nbsp;<math>\epsilon_i = v_i - v_0 - \beta h_i </math>
+
&nbsp;&nbsp;&nbsp;<math>\epsilon_i = v_i - v_0 - b h_i </math>
  
Given the locations <math>(h_i, v_i)</math> of the shots in the group on the target, the coefficients <math>v_0</math> and <math>\beta</math> are calculated to give a "best" fit to the data. The "best" fit is deemed to be when the sum of the squares of the errors (SSE) is minimized.  
+
Given the locations <math>(h_i, v_i)</math> of the shots in the group on the target, the coefficients <math>v_0</math> and <math>b</math> are calculated to give a "best" fit to the data. The "best" fit is deemed to be when the sum of the squares of the errors (SSE) is minimized.  
  
<math>SSE = \sum_{i=1}^n \epsilon_i^2 = \sum_{i=1}^n \lbrace v_i - v_0 - \beta h_i \rbrace^2</math>
+
<math>SSE = \sum_{i=1}^n \epsilon_i^2 = \sum_{i=1}^n \lbrace v_i - v_0 - b h_i \rbrace^2</math>
  
 
There are two examples of best fits lines shown below. The graph on the right shows the "residuals" from the fit line as vertical line segments from the horizontal value which is assumed to be absolutely accurate to the vertical value which is assumed to contain the error. Values above the line are positive and values below the line are negative.   
 
There are two examples of best fits lines shown below. The graph on the right shows the "residuals" from the fit line as vertical line segments from the horizontal value which is assumed to be absolutely accurate to the vertical value which is assumed to contain the error. Values above the line are positive and values below the line are negative.   
Line 48: Line 48:
  
 
:{| class="wikitable"  
 
:{| class="wikitable"  
| [[File:Bullseye.jpg|50px]] For the population of shots, if there is a linear relationship between the horizontal and vertical positions of a shot, then the point <math>(\mu_h, \mu_v)</math> would be on the line. Thus around <math>(\mu_h, \mu_v)</math> <math>\beta</math> would not only be the slope of the line, but it would also be a proportionality constant.  
+
| [[File:Bullseye.jpg|50px]] For the population of shots, if there is a linear relationship between the horizontal and vertical positions of a shot, then the point <math>(\mu_h, \mu_v)</math> would be on the line. Thus around <math>(\mu_h, \mu_v)</math> <math>b</math> would not only be the slope of the line, but it would also be a proportionality constant.  
  
<math>\beta = \frac{(v-\mu_v)}{(h-\mu_h)}</math>
+
<math>b = \frac{(v-\mu_v)}{(h-\mu_h)}</math>
 
|}
 
|}

Revision as of 12:12, 10 June 2015

In going from the the Normal distributions for the horizontal axis, \(\mathcal{N}(\mu_h,\,\sigma_h^2)\), and vertical axis, \(\mathcal{N}(\mu_v,\,\sigma_v^2)\) a new equation was postulated with a parameter \(\rho\).

    \( f(h,v; \mu_h, \mu_v, \sigma_h, \sigma_v, \rho) = \frac{1}{2 \pi \sigma_h \sigma_v \sqrt{1-\rho^2}} \exp\left( -\frac{1}{2(1-\rho^2)}\left[ \frac{(h-\mu_h)^2}{\sigma_h^2} + \frac{(v-\mu_v)^2}{\sigma_v^2} - \frac{2\rho(h-\mu_h)(v-\mu_v)}{\sigma_h \sigma_v} \right] \right) \)

First a bit of explanation about what \(\rho\) is. Assuming that two variables are correlated, a simple correlation to propose is that the two variables are linearly correlated. Thus for some point \(i\) the equation of interest is:

   \(v_i = v_0 + b h_i + \epsilon_i\)

Where \(v_0\) is the intercept along the vertical axis, \(b\) is the slope of the line, and \(\epsilon_i\) is the error in the \(i\)th measurement. Rearranging the equation for \(\epsilon_i\):

   \(\epsilon_i = v_i - v_0 - b h_i \)

Given the locations \((h_i, v_i)\) of the shots in the group on the target, the coefficients \(v_0\) and \(b\) are calculated to give a "best" fit to the data. The "best" fit is deemed to be when the sum of the squares of the errors (SSE) is minimized.

\(SSE = \sum_{i=1}^n \epsilon_i^2 = \sum_{i=1}^n \lbrace v_i - v_0 - b h_i \rbrace^2\)

There are two examples of best fits lines shown below. The graph on the right shows the "residuals" from the fit line as vertical line segments from the horizontal value which is assumed to be absolutely accurate to the vertical value which is assumed to contain the error. Values above the line are positive and values below the line are negative.

800px-Linear regression.pngLinear least squares example.png

With the "best line" fit, then the correlation coefficient \(\rho\) is given by the equation:

    \(\rho = \rho_{hv} =\frac{\sum ^n _{i=1}(h_i - \bar{h})(v_i - \bar{v})}{\sqrt{\sum ^n _{i=1}(h_i - \bar{h})^2} \sqrt{\sum ^n _{i=1}(v_i - \bar{v})^2}}\)  where \(-1 \leq \rho \leq 1\)

Bullseye.jpg !! CAREFUL !! Blithely accepting the correlation coefficient as gospel is foolish. The residuals should be eyeballed in an effort to determine if there is some problem in the fitting.

The figure below shows some examples of the correlation coefficients for data fit to lines. The sign of \(\rho\) is from the slope of the line. In terms of the absolute value of \(\rho\), \(\mid \rho \mid = 0 \) means that there is no correlation (i.e. the points are totally random) to \(\mid \rho \mid = 1 \) which means that the is perfect correlation (i.e. all points are exactly on the line).

Correlation coefficient.png

Bullseye.jpg !! CAREFUL !! Correlation does not prove causation

Thus imagine an elliptical shot pattern, with a lot of shots to reduce noise, being rotated about its COI. Then \(\rho = 0\) when the major axis of the ellipse is along the horizontal or vertical axis. The maximum value of \(\rho\) would when the major axis of the ellipse was along the one of the two (negative and positive slope) 45 degree lines bisecting the horizontal and vertical axes.

Bullseye.jpg For the population of shots, if there is a linear relationship between the horizontal and vertical positions of a shot, then the point \((\mu_h, \mu_v)\) would be on the line. Thus around \((\mu_h, \mu_v)\) \(b\) would not only be the slope of the line, but it would also be a proportionality constant.

\(b = \frac{(v-\mu_v)}{(h-\mu_h)}\)