Difference between revisions of "Precision Models"

From ShotStat
Jump to: navigation, search
(Example)
(Statistical Analysis of Dispersion)
(43 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Statistical Models of Dispersion =
+
<p style="text-align:right"><B>Previous:</B> [[Describing Precision]]</p>
  
== The General Bivariate Normal ==
+
= Models of Dispersion =
[http://en.wikipedia.org/wiki/Normal_distribution The normal, a.k.a. Gaussian, distribution] is the accepted model of a random variable like the dispersion of a physical gunshot from its center point.  The normal distribution is parameterized by its mean and standard deviation, or <math>(\mu, \sigma)</math>.  As explained in ''[[What is Precision?]]'' we are only interested in the dispersion component, since the center point of impact is controlled by sighting the gun in.  Therefore we will assume that a gunner can dial <math>\mu = 0</math>, and leave that parameter out of the question in what follows.
 
  
Since we are interested in shot dispersion on a two-dimensional target we will look at a [http://en.wikipedia.org/wiki/Bivariate_normal_distribution bivariate normal distribution], which has separate parameters for the standard deviation in each dimension, <math>\sigma_x, \sigma_y</math>, as well as a correlation parameter ''ρ''.
+
We present four options for measuring and analyzing precision:
 +
# [[Closed Form Precision]]
 +
# [[Circular Error Probable]]
 +
# [[Elliptic Error Probable]]
 +
# [[Range Statistics]]
 +
# [[Order Statistics]]
  
== Uncorrelated Bivariate Normal ==
+
Before selecting one consider the following background:
We don't have any compelling evidence that in general there is, or should be, correlation between the horizontal and vertical dispersion of gunshots.  Therefore, for purposes of modelling we set ''ρ'' = 0.
 
  
We do know that targets can often exhibit vertical or horizontal stringing, and therefore <math>\sigma_x \neq \sigma_y</math>.  To the extent these parameters are not equal they produce elliptical instead of round shot groups.
+
== General Bivariate Normal ==
 +
[http://en.wikipedia.org/wiki/Normal_distribution The normal, a.k.a. Gaussian, distribution] is the broadly accepted model of a random variable like the dispersion of a physical gunshot from its center point.  The normal distribution is parameterized by its mean and standard deviation, or <math>(\mu, \sigma)</math>.  As explained in ''[[What is Precision?]]'' we are only interested in the dispersion component, since the center point of impact is controlled by [[FAQ#How_many_shots_do_I_need_to_sight_in.3F|sighting in the gun]] (i.e., adjusting its aiming device).  Therefore we will assume that a gunner can dial <math>\mu \approx 0</math> and leave that parameter out of the question in what follows.
  
However, we know some of the significant sources of stringing and can factor them out:
+
Since we are interested in shot dispersion on a two-dimensional target we will look at a [http://en.wikipedia.org/wiki/Bivariate_normal_distribution bivariate normal distribution], which has separate parameters for the standard deviation in each dimension, <math>\sigma_x, \sigma_y</math>, as well as a correlation parameter ''ρ''.
  
# The primary source of x-specific variance is crosswind.  If we measure the wind while shooting we can bound and remove a “wind variance” term from that axis.  E.g., "Suppose the orthogonal component of wind is ranging at random from 0-10mph during the shooting.  Given time-of-flight ''t'' this will expand the no-wind horizontal dispersion at the target by <math>\sigma_w</math>." [[ToDo|Closed-form equation for this?]]
+
== Uncorrelated Bivariate Normal ==
 
+
We don't have any compelling evidence that in general there is, or should be, correlation between the horizontal and vertical dispersion of gunshotsTherefore, for most of our analysis we will assume ''ρ'' = 0.
# The primary source of y-specific variance is muzzle velocity, which we can actually measure with a chronograph (or assert) and then remove from that axis.  E.g., "If standard deviation of muzzle velocity is <math>\sigma_{mv}</math> then, given time-of-flight ''t'' the vertical spread attributable to that is some <math>D(t, \sigma_{mv}</math>)." [[ToDo|Closed-form equation for this?]]
 
 
 
Substantially [[asymmetric shot groups]] will be addressed in a separate section.
 
 
 
== Symmetric Bivariate Normal = Rayleigh ==
 
After factoring out the known sources of asymmetry in the bivariate normal model we believe that shot groups are sufficiently symmetric that we can assume <math>\sigma_x = \sigma_y</math>.  In this case the dispersion of shots is modeled by a symmetric bivariate normal, which is equivalent<ref>[http://home.kpn.nl/jhhogema1966/skeetn/ballist/sgs/sgs.htm#_Toc96439743 ''Shot group statistics'', Jeroen Hogema, 2005]</ref> to the Rayleigh distribution, described by a single parameter ''σ''.
 
 
 
NB: It is common to describe normal distributions using variance, or <math>\sigma^2</math>, because variances have some convenient linear characteristics that are lost when we take the square root.  For similar reasons many prefer to describe the Rayleigh distribution using a parameter <math>\gamma = \sigma^2</math>.  To clarify our parameterization the ''σ'' we will be describing is the standard deviation of the bivariate normal distribution, and which produces the following pdf for the Rayleigh distribution:
 
:&nbsp; <math>\frac{x}{\sigma^2}e^{-x^2/2\sigma^2}</math>
 
 
 
= Estimating ''σ''=
 
There are a number of pitfalls when it comes to estimating ''σ'' from sample targets.  We will reference the following three correction factors in what follows.  Note that all of these correction factors are > 1, are significant for very small ''n'', and converge towards 1 as <math>n \to \infty</math>.  Their values are listed for ''n'' up to 100 in [[Media:Sigma1ShotStatistics.ods]].
 
 
 
== [http://en.wikipedia.org/wiki/Bessel%27s_correction Bessel correction factor] ==
 
The Bessel correction removes bias in sample variance.
 
:&nbsp; <math>c_{B}(n) = \frac{n}{n-1}</math>
 
 
 
One way of thinking of this bias in this context is to realize that we can never observe the true center of the shot distribution.  When we calculate the center of a group on a target it will likely be some distance from the true center, and thus underestimate the true distance of the sample shots to the distribution center.  (Average distance from sample center to true center is listed in the second column of [[Media:Sigma1ShotStatistics.ods]].)
 
 
 
== [http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation#Results_for_the_normal_distribution Gaussian correction factor] ==
 
The Gaussian correction (sometimes called <math>c_4</math>) removes bias introduced by taking the square root of variance.
 
:&nbsp; <math>\frac{1}{c_{G}(n)} = \sqrt{\frac{2}{n-1}}\,\frac{\Gamma\left(\frac{n}{2}\right)}{\Gamma\left(\frac{n-1}{2}\right)} \, = \, 1 - \frac{1}{4n} - \frac{7}{32n^2} - \frac{19}{128n^3} + O(n^{-4})</math>
 
 
 
The third-order approximation is adequate.  The following spreadsheet formula gives a more direct calculation: <code>=EXP(LN(SQRT(2/(N-1))) + GAMMALN(N/2) - GAMMALN((N-1)/2))</code>
 
 
 
== Rayleigh correction factor ==
 
The unbiased estimator for the Rayleigh distribution is also for <math>\sigma^2</math>.  The following corrects for the concavity introduced by taking the square root to get ''σ''.
 
:&nbsp; <math>c_{R}(n) = 4^n \sqrt{\frac{n}{\pi}} \frac{ N!(N-1)!} {(2N)!}</math> <ref>[[Media:Statistical Inference for Rayleigh Distributions - Siddiqui, 1964.pdf|''Statistical Inference for Rayleigh Distributions'', M. M. Siddiqui, 1964, p.1007]]</ref>
 
 
 
To avoid overflows this is better calculated using log-gammas, as in the following spreadsheet formula: <code>=EXP(LN(SQRT(N/PI())) + N*LN(4) + GAMMALN(N+1) + GAMMALN(N) - GAMMALN(2N+1))</code>
 
 
 
== Data ==
 
In the following formulas assume that we are looking at a target reflecting ''n'' shots and that we are able to determine the center coordinates ''x'' and ''y'' for each shot.
 
 
 
(One easy way to compile these data is to process an image of the target through a program like [http://ontargetshooting.com/features.html OnTarget Precision Calculator].)
 
 
 
== Variance Estimates ==
 
For a single axis the [http://en.wikipedia.org/wiki/Bessel's_correction#Formula unbiased estimate of variance] for a normal distribution is <math>s_x^2 = \frac{\sum (x_i - \bar{x})^2}{n-1} </math>, from which the unbiased estimate of standard deviation is <math>\widehat{\sigma_x} = c_G(n) \sqrt{(s_x^2)}</math>.
 
 
 
Since we are assuming that the shot dispersion is jointly independent and identically distributed along the ''x'' and ''y'' axes we improve our estimate by aggregating the data from both dimensions.  I.e., we look at the average sample variance <math>s^2 = (s_x^2 + s_y^2)/2</math>, and <math>\hat{\sigma} = c_G(2n-1) \sqrt{s^2}</math>.  This turns out to be identical to the Rayleigh estimator.
 
 
 
== Rayleigh Estimates ==
 
The Rayleigh distribution describes the random variable ''R'' defined as the distance of each shot from the center of the distributionAgain, we never get to observe the true center, so we begin by calculating the sample center <math>(\bar{x}, \bar{y})</math>.  Then for each shot we can compute the sample radius <math>r_i = \sqrt{(x_i - \bar{x})^2 + (y_i - \bar{y})^2}</math>.
 
 
 
The [http://en.wikipedia.org/wiki/Rayleigh_distribution#Parameter_estimation unbiased Rayleigh estimator] is <math>\widehat{\sigma^2} = c_B(n) \frac{\sum r_i^2}{2n} = \frac{c_B(n)}{2} \overline{r^2}</math>, from which the unbiased parameter is once again <math>\hat{\sigma} = c_G(2n-1) \sqrt{\widehat{\sigma^2}}</math>
 
 
 
== Confidence Intervals ==
 
Siddiqui<ref>[[Media:Some Problems Connected With Rayleigh Distributions - Siddiqui 1961.pdf|''Some Problems Connected With Rayleigh Distributions'', M. M. Siddiqui, 1961, p.169]]</ref> shows that the confidence intervals are given by the <math>\chi^2</math> distribution with 2''n'' degrees of freedom.  However this assumes we know the true center of the distribution.  We lose one degree of freedom by using the sample center, so we actually have only 2(''n'' - 1) degrees of freedom.  To find the (1 - ''α'') confidence interval, first find <math>\chi_1^2, \ \chi_2^2</math> where:
 
:&nbsp; <math>Pr(\chi^2(2(n-1)) \leq \chi_1^2) = \alpha/2, \quad Pr(\chi^2(2(n-1)) \leq \chi_2^2) = 1 - \alpha/2</math>
 
then
 
:&nbsp; <math>\frac{2(n-1)\overline{r^2}}{\chi_2^2} \leq \widehat{\sigma^2} \leq \frac{2(n-1)\overline{r^2}}{\chi_1^2}</math>
 
 
 
= Examples =
 
 
 
== The 3-shot Group ==
 
[[File:3ShotSample.png|210px|thumb|right|Sample 3-shot group|Sample 3-shot group with 1/2" extreme spread. Sample center is in red. Each shot has ''r'' = .29".]]
 
A rifle builder sends you a 3-shot group measuring ½" between each of three centers to prove how accurate your rifle is.  ''What does that really say about the gun's accuracy?''
 
In the ''best'' case &ndash; i.e.:
 
# The group was actually fired from your gun
 
# The group was actually fired at the distance indicated (in this case 100 yards)
 
# The group was not cherry-picked from a larger sample &ndash; e.g., the best of an unknown number of test 3-shot groups
 
# The group was not clipped from a larger group (in the style of [http://www.ar15.com/forums/t_3_118/500913_.html the "Texas Sharpshooter"])
 
&mdash; if all of these conditions are satisfied, then we have a statistically valid sample.  In this case our group is an equilateral triangle with ½" sides.  A little geometry shows the distance from each point to sample center is <math>r_i = \frac{1}{2 \sqrt{3}} \approx .29"</math>.
 
 
 
The Rayleigh estimator <math>\widehat{\sigma^2} = c_B(3) \frac{\sum r_i^2}{6} = \frac{3}{2} \frac{1}{24} = \frac{1}{16}</math>.  So <math>\hat{\sigma} = c_G(2n - 1) \sqrt{1/16} = (\frac{4}{3}\sqrt{\frac{2}{\pi}})\frac{1}{4} \approx .25MOA</math>.  Not bad!  But not very significant.  Let's check the confidence intervals: For ''α'' = 5% (i.e., 95% confidence intervals)
 
:&nbsp; <math>\chi_1^2(4) \approx 0.484, \quad \chi_2^2(4) \approx 11.14</math>.  Therefore,
 
:&nbsp; <math>0.03 \approx \frac{1}{3 \chi_2^2} \leq \widehat{\sigma^2} \leq \frac{1}{3 \chi_1^2} \approx 0.69</math>, and <math>0.17 \leq \hat{\sigma} \leq 0.83</math>
 
so with 95% certainty we can only say that the gun's true precision ''σ'' is somewhere in the range from .17MOA to .83MOA.
 
 
 
== Inference from Extreme Spreads ==
 
What can we deduce about the precision of a gun from extreme spread samples?
 
  
[[File:RadiusBounds.png|270px|thumb|right|Extreme Spread Bounds]]
+
We do know that targets can often exhibit vertical or horizontal stringing, and therefore <math>\sigma_x \neq \sigma_y</math>.  To the extent these parameters are not equal they produce elliptical instead of circular shot groups.
Without knowing the radius of each shot we can still put upper and lower bounds on the group's radii.  The image at right shows that if we only know the Extreme Spread ''ES'' and the number of shots ''n'' in the group then we can assert the following bounds on the average radius:
 
:&nbsp; <math>\overline{r_U} = ES / 2</math> is an '''upper bound''', with <math>\overline{r_U^2} = (ES / 2)^2</math>
 
:&nbsp; <math>\overline{r_L} = \frac{1}{n} ((n - 1) \frac{ES}{n} + ES(1 - \frac{1}{n})) = \frac{2(n - 1)}{n^2} ES</math> is a '''lower bound''', with <math>\overline{r_L^2} = (n - 1) (\frac{ES}{n})^2</math>
 
  
We can then derive confidence intervals using these bounded radii:
+
However, we know some of the significant sources of stringing and can potentially factor them out:
:&nbsp; <math>\frac{2(n-1)\overline{r_L^2}}{\chi_2^2} \leq \widehat{\sigma^2} \leq \frac{2(n-1)\overline{r_U^2}}{\chi_1^2}</math>
 
  
=== Example ===
+
# The primary source of x-specific variance is crosswind.  If we measure the wind while shooting we can bound and remove a “wind variance” term from that axis.  E.g., "Suppose the orthogonal component of wind is ranging at random from 0-10mph during the shooting.  Given lag-time ''t'' this will expand the no-wind horizontal dispersion at the target by <math>\sigma_w</math>."<ref>Wind deflection is a function of the ballistic curve and distance, but can be expressed as a simple product of the cross-wind velocity and lag timeFor more information on the "lag rule" see Bryan Litz, ''Applied Ballistics for Long Range Shooting, 2<sup>nd</sup> Edition'' (2011) A4; or Robert McCoy, ''Modern Exterior Ballistics, 2<sup>nd</sup> Edition'' (2012) 7.27.</ref>  Since variances are additive we could adjust <math>\sigma_x</math> via the equation <math>{\sigma'}_x^2 = \sigma_x^2 - \sigma_w^2</math>.
The standard precision measure given in the NRA's magazines is the minimum, maximum, and average extreme spread of five 5-shot groups.  ''Suppose they show an average group size of 1MOA: What is the implied precision parameter, and what is our confidence in it?''
+
# The primary source of y-specific variance is muzzle velocity, which we can actually measure with a chronograph (or assert) and then remove from that axis.  E.g., "If standard deviation of muzzle velocity is <math>\sigma_{mv}</math> then, given the bullet's ballistic model for the given target distance, the vertical spread attributable to that is some <math>\sigma_v</math>.  Here too we can remove this known source of dispersion from our samples via the equation <math>{\sigma'}_y^2 = \sigma_y^2 - \sigma_v^2</math>.  This adjustment is shown in several of the examples:
 +
#* [[22LR CCI 40gr HV 40-shot 100-yard Example]]
 +
#* [[300BLK Subsonic 20-shot 100-yard Example]]
  
Note that both the ''σ'' estimator and confidence intervals depend on the sample values of each shot, <math>r_i^2</math>.  We can't observe the ''r'' values directly, but we can put bounds on them.  In this case we have five groups of five shots, and we can bound each group based on its stated extreme spread.  To simplify the example let's just assume that each group had the same extreme spread of 1MOA.  So we have ''n'' = 25 shots with the same upper and lower bounds, and ''ES'' = 1.  From the formulas for the bounds and for the Rayleigh estimator:
+
= Statistical Analysis of Dispersion =
:&nbsp; <math>\widehat{\sigma_U^2} = \frac{c_B(25)}{2} (\frac{1}{4}) = \frac{25}{192} \approx .13</math>
 
:&nbsp; <math>\widehat{\sigma_L^2} = \frac{c_B(25)}{2} (\frac{24}{625}) = \frac{25}{48} (\frac{24}{625}) = \frac{1}{50} = .02</math>
 
  
Taking square roots and applying the correction <math>c_G(49) \approx 1.00522</math> we have:  
+
In view of the preceding:
:&nbsp; <math>\widehat{\sigma_L} \approx 0.14 \leq \widehat{\sigma} \leq 0.36 \approx \widehat{\sigma_U}</math>
+
# The [[Closed Form Precision]] model requires that we assume the shot group is, or can be normalized to be, a fairly symmetric bivariate Gaussian process.  This assumption is the most amenable to statistical analysis.
 +
# [[Order Statistics]] are slightly less efficient and amenable to abstract analysis, but are both more robust and easier to apply "in the field."
 +
# [[Circular Error Probable]] disregards any ellipticity in the actual shot process in order to characterize precision using a single parameter. Since most of precision estimation is for the purposes of comparing loads, rifles, and shooters, we need a single number and we don't care if the dispersion is elliptic: tighter is always better.
 +
# [[Elliptic Error Probable]] allows for a full characterization of the General Bivariate Normal model.  For some applications &ndash; e.g., computing hit probabilities on non-circular targets &ndash; we want to preserve statistically significant ellipticity.  And for a few &ndash; e.g., harmonic dampening, and perhaps shooter technique correction &ndash; the orientation of the ellipse produced by non-zero ''ρ'' could be helpful if it can be estimated.
 +
# Extreme Spread and the other [[Range Statistics]], which increase with number of shots per group ''n'', do not have any useful functional forms.  The characteristics of these measures have to be derived from Monte Carlo simulation.  They are the least efficient statistics but are also the most commonly used because they are so easy to measure in the field and so familiar to shooters.
  
Our 95% confidence intervals are based on:
+
One practical question that many shooters raise is what to do with outliers, known in the sport as "fliers." We address [[Fliers|fliers here]].
:&nbsp; <math>\chi_1^2(48) \approx 30.75, \quad \chi_2^2(48) \approx 69.02</math>Therefore, using the lower bound <math>\overline{r_L^2}</math> for the lower confidence interval, and the upper bound <math>\overline{r_U^2}</math> for the upper confidence interval, we have:
 
:&nbsp; <math>0.060 \leq \widehat{\sigma^2} \leq 0.17</math>, and <math>0.24 \leq \hat{\sigma} \leq 0.42</math>.
 
  
As we see in [[Predicting_Precision#Spread_Measures]] the expected extreme spread from 5-shot groups is 3.06 ''σ''.  So based on the NRA data we can at least say that with 95% certainty the average of future 5-shot group extreme spreads should be in the range (.75, 1.28)MOA
+
= Tools =
 +
See [[Measuring Tools]] for convenient ways of measuring and analyzing precision.
  
 
= References =
 
= References =
 
<references />
 
<references />

Revision as of 16:59, 5 June 2016

Previous: Describing Precision

Models of Dispersion

We present four options for measuring and analyzing precision:

  1. Closed Form Precision
  2. Circular Error Probable
  3. Elliptic Error Probable
  4. Range Statistics
  5. Order Statistics

Before selecting one consider the following background:

General Bivariate Normal

The normal, a.k.a. Gaussian, distribution is the broadly accepted model of a random variable like the dispersion of a physical gunshot from its center point. The normal distribution is parameterized by its mean and standard deviation, or \((\mu, \sigma)\). As explained in What is Precision? we are only interested in the dispersion component, since the center point of impact is controlled by sighting in the gun (i.e., adjusting its aiming device). Therefore we will assume that a gunner can dial \(\mu \approx 0\) and leave that parameter out of the question in what follows.

Since we are interested in shot dispersion on a two-dimensional target we will look at a bivariate normal distribution, which has separate parameters for the standard deviation in each dimension, \(\sigma_x, \sigma_y\), as well as a correlation parameter ρ.

Uncorrelated Bivariate Normal

We don't have any compelling evidence that in general there is, or should be, correlation between the horizontal and vertical dispersion of gunshots. Therefore, for most of our analysis we will assume ρ = 0.

We do know that targets can often exhibit vertical or horizontal stringing, and therefore \(\sigma_x \neq \sigma_y\). To the extent these parameters are not equal they produce elliptical instead of circular shot groups.

However, we know some of the significant sources of stringing and can potentially factor them out:

  1. The primary source of x-specific variance is crosswind. If we measure the wind while shooting we can bound and remove a “wind variance” term from that axis. E.g., "Suppose the orthogonal component of wind is ranging at random from 0-10mph during the shooting. Given lag-time t this will expand the no-wind horizontal dispersion at the target by \(\sigma_w\)."[1] Since variances are additive we could adjust \(\sigma_x\) via the equation \({\sigma'}_x^2 = \sigma_x^2 - \sigma_w^2\).
  2. The primary source of y-specific variance is muzzle velocity, which we can actually measure with a chronograph (or assert) and then remove from that axis. E.g., "If standard deviation of muzzle velocity is \(\sigma_{mv}\) then, given the bullet's ballistic model for the given target distance, the vertical spread attributable to that is some \(\sigma_v\). Here too we can remove this known source of dispersion from our samples via the equation \({\sigma'}_y^2 = \sigma_y^2 - \sigma_v^2\). This adjustment is shown in several of the examples:

Statistical Analysis of Dispersion

In view of the preceding:

  1. The Closed Form Precision model requires that we assume the shot group is, or can be normalized to be, a fairly symmetric bivariate Gaussian process. This assumption is the most amenable to statistical analysis.
  2. Order Statistics are slightly less efficient and amenable to abstract analysis, but are both more robust and easier to apply "in the field."
  3. Circular Error Probable disregards any ellipticity in the actual shot process in order to characterize precision using a single parameter. Since most of precision estimation is for the purposes of comparing loads, rifles, and shooters, we need a single number and we don't care if the dispersion is elliptic: tighter is always better.
  4. Elliptic Error Probable allows for a full characterization of the General Bivariate Normal model. For some applications – e.g., computing hit probabilities on non-circular targets – we want to preserve statistically significant ellipticity. And for a few – e.g., harmonic dampening, and perhaps shooter technique correction – the orientation of the ellipse produced by non-zero ρ could be helpful if it can be estimated.
  5. Extreme Spread and the other Range Statistics, which increase with number of shots per group n, do not have any useful functional forms. The characteristics of these measures have to be derived from Monte Carlo simulation. They are the least efficient statistics but are also the most commonly used because they are so easy to measure in the field and so familiar to shooters.

One practical question that many shooters raise is what to do with outliers, known in the sport as "fliers." We address fliers here.

Tools

See Measuring Tools for convenient ways of measuring and analyzing precision.

References

  1. Wind deflection is a function of the ballistic curve and distance, but can be expressed as a simple product of the cross-wind velocity and lag time. For more information on the "lag rule" see Bryan Litz, Applied Ballistics for Long Range Shooting, 2nd Edition (2011) A4; or Robert McCoy, Modern Exterior Ballistics, 2nd Edition (2012) 7.27.