# Difference between revisions of "Precision Models"

Line 1: | Line 1: | ||

− | = | + | = Statistical Models of Dispersion = |

+ | |||

+ | == The General Bivariate Normal == | ||

+ | [http://en.wikipedia.org/wiki/Normal_distribution The normal, a.k.a. Gaussian, distribution] is the accepted model of a random variable like the dispersion of a physical gunshot from its center point. The normal distribution is parameterized by its mean and standard deviation, or <math>(\mu, \sigma)</math>. As explained in ''[[What is Precision?]]'' we are only interested in the dispersion component, since the center point of impact is controlled by sighting the gun in. Therefore we will assume that a gunner can dial <math>\mu = 0</math>, and leave that parameter out of the question in what follows. | ||

+ | |||

+ | Since we are interested in shot dispersion on a two-dimensional target we will look at a [http://en.wikipedia.org/wiki/Bivariate_normal_distribution bivariate normal distribution], which has separate parameters for the standard deviation in each dimension, <math>\sigma_x, \sigma_y</math>, as well as a correlation parameter ''ρ''. | ||

+ | |||

+ | == Uncorrelated Bivariate Normal == | ||

+ | We don't have any compelling evidence that in general there is, or should be, correlation between the horizontal and vertical dispersion of gunshots. Therefore, for purposes of modelling we set ''ρ'' = 0. | ||

+ | |||

+ | We do know that targets can often exhibit vertical or horizontal stringing, and therefore <math>\sigma_x \neq \sigma_y</math>. To the extent these parameters are not equal they produce elliptical instead of round shot groups. | ||

+ | |||

+ | However, we know some of the significant sources of stringing and can factor them out: | ||

+ | |||

+ | # The primary source of x-specific variance is crosswind. If we measure the wind while shooting we can bound and remove a “wind variance” term from that axis. E.g., "Suppose the orthogonal component of wind is ranging at random from 0-10mph during the shooting. Given time-of-flight ''t'' this will expand the no-wind horizontal dispersion at the target by <math>\sigma_w</math>." [[ToDo|Closed-form equation for this?]] | ||

+ | |||

+ | # The primary source of y-specific variance is muzzle velocity, which we can actually measure with a chronograph (or assert) and then remove from that axis. E.g., "If standard deviation of muzzle velocity is <math>\sigma_{mv}</math> then, given time-of-flight ''t'' the vertical spread attributable to that is some <math>D(t, \sigma_{mv}</math>)." [[ToDo|Closed-form equation for this?]] | ||

+ | |||

+ | [[Examples]] | ||

+ | |||

+ | == Symmetric Bivariate Normal = Rayleigh == | ||

+ | After factoring out the known sources of asymmetry in the bivariate normal model we believe that shot groups are sufficiently symmetric that we can assume <math>\sigma_x = \sigma_y</math>. In this case the dispersion of shots is modeled by a symmetric bivariate normal, which is equivalent<ref>[http://home.kpn.nl/jhhogema1966/skeetn/ballist/sgs/sgs.htm#_Toc96439743 ''Shot group statistics'', Jeroen Hogema, 2005]</ref> to the Rayleigh distribution, described by a single parameter ''σ''. | ||

+ | |||

+ | NB: It is common to describe normal distributions using variance, or <math>\sigma^2</math>, because variances have some convenient linear characteristics that are lost when we take the square root. For similar reasons many prefer to describe the Rayleigh distribution using a parameter <math>\gamma = \sigma^2</math>. To clarify our parameterization the ''σ'' we will be describing is the standard deviation of the bivariate normal distribution, and which produces the following pdf for the Rayleigh distribution: | ||

+ | : <math>\frac{x}{\sigma^2}e^{-x^2/2\sigma^2}</math> | ||

+ | |||

+ | = Estimating ''σ''= | ||

+ | There are a number of pitfalls when it comes to estimating ''σ'' from sample targets. We will reference the following three correction factors in what follows. Note that all of these correction factors are > 1, are significant for very small ''n'', and converge towards 1 as <math>n \to \infty</math>. Their values are listed for ''n'' up to 100 in [[Media:Sigma1ShotStatistics.ods]]. | ||

+ | |||

+ | == [http://en.wikipedia.org/wiki/Bessel%27s_correction Bessel correction factor] == | ||

+ | The Bessel correction removes bias in sample variance. | ||

+ | : <math>c_{B}(n) = \frac{n}{n-1}</math> | ||

+ | |||

+ | One way of thinking of this bias in this context is to realize that we can never observe the true center of the shot distribution. When we calculate the center of a group on a target it will likely be some distance from the true center, and thus underestimate the true distance of the sample shots to the distribution center. (Average distance from sample center to true center is listed in the second column of [[Media:Sigma1ShotStatistics.ods]].) | ||

+ | |||

+ | == [http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation#Results_for_the_normal_distribution Gaussian correction factor] == | ||

+ | The Gaussian correction (sometimes called <math>c_4</math>) removes bias introduced by taking the square root of variance. | ||

+ | : <math>\frac{1}{c_{G}(n)} = \sqrt{\frac{2}{n-1}}\,\frac{\Gamma\left(\frac{n}{2}\right)}{\Gamma\left(\frac{n-1}{2}\right)} \, = \, 1 - \frac{1}{4n} - \frac{7}{32n^2} - \frac{19}{128n^3} + O(n^{-4})</math> | ||

+ | |||

+ | The third-order approximation is adequate. The following spreadsheet formula gives a more direct calculation: <code>=EXP(LN(SQRT(2/(N-1))) + GAMMALN(N/2) - GAMMALN((N-1)/2))</code> | ||

== Rayleigh correction factor == | == Rayleigh correction factor == | ||

− | :<math>c_{R}(n) = 4^n \sqrt{\frac{n}{\pi}} \frac{ N!(N-1)!} {(2N)!}</math> | + | The unbiased estimator for the Rayleigh distribution is also for <math>\sigma^2</math>. The following corrects for the concavity introduced by taking the square root to get ''σ''. |

− | To avoid overflows this is better calculated using log-gammas, as in the following spreadsheet formula: | + | : <math>c_{R}(n) = 4^n \sqrt{\frac{n}{\pi}} \frac{ N!(N-1)!} {(2N)!}</math> <ref>[[Media:Statistical Inference for Rayleigh Distributions - Siddiqui, 1964.pdf|''Statistical Inference for Rayleigh Distributions'', M. M. Siddiqui, 1964, p.1007]]</ref> |

− | + | ||

+ | To avoid overflows this is better calculated using log-gammas, as in the following spreadsheet formula: <code>=EXP(LN(SQRT(N/PI())) + N*LN(4) + GAMMALN(N+1) + GAMMALN(N) - GAMMALN(2N+1))</code> | ||

+ | |||

+ | == Data == | ||

+ | In the following formulas assume that we are looking at a target reflecting ''n'' shots and that we are able to determine the center coordinates ''x'' and ''y'' for each shot. | ||

+ | |||

+ | (One easy way to compile these data is to process an image of the target through a program like [http://ontargetshooting.com/features.html OnTarget Precision Calculator].) | ||

+ | |||

+ | == Variance Estimates == | ||

+ | For a single axis the [http://en.wikipedia.org/wiki/Bessel's_correction#Formula unbiased estimate of variance] for a normal distribution is <math>s_x^2 = \frac{\sum (x_i - \bar{x})^2}{n-1} </math>, from which the unbiased estimate of standard deviation is <math>\widehat{\sigma_x} = c_G(n) \sqrt{(s_x^2)}</math>. | ||

+ | |||

+ | Since we are assuming that the shot dispersion is jointly independent and identically distributed along the ''x'' and ''y'' axes we improve our estimate by aggregating the data from both dimensions. I.e., we look at the average sample variance <math>s^2 = (s_x^2 + s_y^2)/2</math>, and <math>\hat{\sigma} = c_G(2n-1) \sqrt{s^2}</math>. This turns out to be identical to the Rayleigh estimator. | ||

+ | |||

+ | == Rayleigh Estimates == | ||

+ | The Rayleigh distribution describes the random variable ''R'' defined as the distance of each shot from the center of the distribution. Again, we never get to observe the true center, so we begin by calculating the sample center <math>(\bar{x}, \bar{y})</math>. Then for each shot we can compute the sample radius <math>r_i = \sqrt{(x_i - \bar{x})^2 + (y_i - \bar{y})^2}</math>. | ||

+ | |||

+ | The [http://en.wikipedia.org/wiki/Rayleigh_distribution#Parameter_estimation unbiased Rayleigh estimator] is <math>s^2 = c_B(n) \frac{\sum r_i^2}{2n}</math>, from which the unbiased parameter is once again <math>\hat{\sigma} = c_G(2n-1) \sqrt{s^2}</math> | ||

+ | |||

+ | == Confidence Intervals == | ||

+ | Siddiqui<ref>[[Media:Some Problems Connected With Rayleigh Distributions - Siddiqui 1961.pdf|''Some Problems Connected With Rayleigh Distributions'', M. M. Siddiqui, 1961, p.169]]</ref> shows that the confidence intervals are given by the <math>\chi^2</math> distribution with 2''n'' degrees of freedom: To find the (1 - ''α'') confidence interval, first find <math>\chi_1^2, \ \chi_2^2</math> where: | ||

+ | : <math>Pr(\chi^2(2n) \leq \chi_1^2) = \alpha/2, \quad Pr(\chi^2(2n) \leq \chi_2^2) = 1 - \alpha/2</math> | ||

+ | then | ||

+ | : <math>\frac{2n\overline{r^2}}{\chi_2^2} \leq \widehat{\sigma^2} \leq \frac{2n\overline{r^2}}{\chi_1^2}</math> | ||

− | == | + | = Examples = |

− | |||

− | |||

− | |||

− | = | + | = References = |

− | + | <references /> |

## Revision as of 19:00, 25 November 2013

# Statistical Models of Dispersion

## The General Bivariate Normal

The normal, a.k.a. Gaussian, distribution is the accepted model of a random variable like the dispersion of a physical gunshot from its center point. The normal distribution is parameterized by its mean and standard deviation, or \((\mu, \sigma)\). As explained in *What is Precision?* we are only interested in the dispersion component, since the center point of impact is controlled by sighting the gun in. Therefore we will assume that a gunner can dial \(\mu = 0\), and leave that parameter out of the question in what follows.

Since we are interested in shot dispersion on a two-dimensional target we will look at a bivariate normal distribution, which has separate parameters for the standard deviation in each dimension, \(\sigma_x, \sigma_y\), as well as a correlation parameter *ρ*.

We don't have any compelling evidence that in general there is, or should be, correlation between the horizontal and vertical dispersion of gunshots. Therefore, for purposes of modelling we set *ρ* = 0.

We do know that targets can often exhibit vertical or horizontal stringing, and therefore \(\sigma_x \neq \sigma_y\). To the extent these parameters are not equal they produce elliptical instead of round shot groups.

However, we know some of the significant sources of stringing and can factor them out:

- The primary source of x-specific variance is crosswind. If we measure the wind while shooting we can bound and remove a “wind variance” term from that axis. E.g., "Suppose the orthogonal component of wind is ranging at random from 0-10mph during the shooting. Given time-of-flight
*t*this will expand the no-wind horizontal dispersion at the target by \(\sigma_w\)." Closed-form equation for this?

- The primary source of y-specific variance is muzzle velocity, which we can actually measure with a chronograph (or assert) and then remove from that axis. E.g., "If standard deviation of muzzle velocity is \(\sigma_{mv}\) then, given time-of-flight
*t*the vertical spread attributable to that is some \(D(t, \sigma_{mv}\))." Closed-form equation for this?

## Symmetric Bivariate Normal = Rayleigh

After factoring out the known sources of asymmetry in the bivariate normal model we believe that shot groups are sufficiently symmetric that we can assume \(\sigma_x = \sigma_y\). In this case the dispersion of shots is modeled by a symmetric bivariate normal, which is equivalent^{[1]} to the Rayleigh distribution, described by a single parameter *σ*.

NB: It is common to describe normal distributions using variance, or \(\sigma^2\), because variances have some convenient linear characteristics that are lost when we take the square root. For similar reasons many prefer to describe the Rayleigh distribution using a parameter \(\gamma = \sigma^2\). To clarify our parameterization the *σ* we will be describing is the standard deviation of the bivariate normal distribution, and which produces the following pdf for the Rayleigh distribution:

- \(\frac{x}{\sigma^2}e^{-x^2/2\sigma^2}\)

# Estimating *σ*

There are a number of pitfalls when it comes to estimating *σ* from sample targets. We will reference the following three correction factors in what follows. Note that all of these correction factors are > 1, are significant for very small *n*, and converge towards 1 as \(n \to \infty\). Their values are listed for *n* up to 100 in Media:Sigma1ShotStatistics.ods.

## Bessel correction factor

The Bessel correction removes bias in sample variance.

- \(c_{B}(n) = \frac{n}{n-1}\)

One way of thinking of this bias in this context is to realize that we can never observe the true center of the shot distribution. When we calculate the center of a group on a target it will likely be some distance from the true center, and thus underestimate the true distance of the sample shots to the distribution center. (Average distance from sample center to true center is listed in the second column of Media:Sigma1ShotStatistics.ods.)

## Gaussian correction factor

The Gaussian correction (sometimes called \(c_4\)) removes bias introduced by taking the square root of variance.

- \(\frac{1}{c_{G}(n)} = \sqrt{\frac{2}{n-1}}\,\frac{\Gamma\left(\frac{n}{2}\right)}{\Gamma\left(\frac{n-1}{2}\right)} \, = \, 1 - \frac{1}{4n} - \frac{7}{32n^2} - \frac{19}{128n^3} + O(n^{-4})\)

The third-order approximation is adequate. The following spreadsheet formula gives a more direct calculation: `=EXP(LN(SQRT(2/(N-1))) + GAMMALN(N/2) - GAMMALN((N-1)/2))`

## Rayleigh correction factor

The unbiased estimator for the Rayleigh distribution is also for \(\sigma^2\). The following corrects for the concavity introduced by taking the square root to get *σ*.

- \(c_{R}(n) = 4^n \sqrt{\frac{n}{\pi}} \frac{ N!(N-1)!} {(2N)!}\)
^{[2]}

To avoid overflows this is better calculated using log-gammas, as in the following spreadsheet formula: `=EXP(LN(SQRT(N/PI())) + N*LN(4) + GAMMALN(N+1) + GAMMALN(N) - GAMMALN(2N+1))`

## Data

In the following formulas assume that we are looking at a target reflecting *n* shots and that we are able to determine the center coordinates *x* and *y* for each shot.

(One easy way to compile these data is to process an image of the target through a program like OnTarget Precision Calculator.)

## Variance Estimates

For a single axis the unbiased estimate of variance for a normal distribution is \(s_x^2 = \frac{\sum (x_i - \bar{x})^2}{n-1} \), from which the unbiased estimate of standard deviation is \(\widehat{\sigma_x} = c_G(n) \sqrt{(s_x^2)}\).

Since we are assuming that the shot dispersion is jointly independent and identically distributed along the *x* and *y* axes we improve our estimate by aggregating the data from both dimensions. I.e., we look at the average sample variance \(s^2 = (s_x^2 + s_y^2)/2\), and \(\hat{\sigma} = c_G(2n-1) \sqrt{s^2}\). This turns out to be identical to the Rayleigh estimator.

## Rayleigh Estimates

The Rayleigh distribution describes the random variable *R* defined as the distance of each shot from the center of the distribution. Again, we never get to observe the true center, so we begin by calculating the sample center \((\bar{x}, \bar{y})\). Then for each shot we can compute the sample radius \(r_i = \sqrt{(x_i - \bar{x})^2 + (y_i - \bar{y})^2}\).

The unbiased Rayleigh estimator is \(s^2 = c_B(n) \frac{\sum r_i^2}{2n}\), from which the unbiased parameter is once again \(\hat{\sigma} = c_G(2n-1) \sqrt{s^2}\)

## Confidence Intervals

Siddiqui^{[3]} shows that the confidence intervals are given by the \(\chi^2\) distribution with 2*n* degrees of freedom: To find the (1 - *α*) confidence interval, first find \(\chi_1^2, \ \chi_2^2\) where:

- \(Pr(\chi^2(2n) \leq \chi_1^2) = \alpha/2, \quad Pr(\chi^2(2n) \leq \chi_2^2) = 1 - \alpha/2\)

then

- \(\frac{2n\overline{r^2}}{\chi_2^2} \leq \widehat{\sigma^2} \leq \frac{2n\overline{r^2}}{\chi_1^2}\)