From ShotStat
Jump to: navigation, search

Herb, 4/19/2015

RE: "Extreme Spread is not only a statistically inefficient measure but also one frequently and easily abused."

The most frequent abuse of extreme spread is chasing the "best group size" (the smallest group). The smallest group size is absolutely meaningless. The valid estimator is the average group size. If you want a smaller group size, just shoot more groups. Sooner or later you'll get lucky and shoot yet an even smaller group by pure luck.

Herb 5/11/2015

Danielson's 2-shot method is very inefficient. Assuming that horizontal and vertical deflection are both Gaussian and equal, and that the correlation coefficient is zero, then the gold standard is the radial standard deviation which is 100% efficient. In Danielson's 2-shot method he analyzed two different brands of ammo. He used 24 shots of each type, but only got 12 measurements per type. Combining all 24 shots for each type and analyzing using the Rayleigh model would be 100% efficient.

I believe the example worked out in this spreadsheet shows how 2-shot samples can be transformed to provide an efficient sample set for the Rayleigh model. The only trick to note is that each pair represents two observations, not just one. Thus from 24 shots we have 24 radii measurements (though only 12 are unique) and this allows us to compute the Rayleigh MLE for a 24-shot sample. If there is an error in that math or example please note it. David (talk) 13:00, 15 May 2015 (EDT)
The problem is that for 24 shots the Rayleigh method would have one average position for the horizontal and vertical position. By just cutting the difference between pairs of shots in half, you have 12 average positions for the horizontal and vertical deflection. User: Herb 15 May 2015, 6:24 EST

The "best" number of shots per group depends on the % of flyers. No flyers, 5-7 shots are about the same and are "best". A high % of flyers would mean that lower number of shots per group would be better.

Can you describe a statistically unbiased method of identifying flyers? David (talk) 13:00, 15 May 2015 (EDT)
You can't describe a distribution for flyers since the distribution function for flyers or any of its parameters are unknown. All that can be done is to "trim" the data based on the analytical model being used. So essentially you'd set a clip level and throw out the "worse" (largest) 5% of the measurements based on simulation. Probably multiple ways to trim data too. For instance think of group size for 5-shot groups. You could set clip levels at the highest and lowest 2.5% levels based on simulation. You could also compare 5-shot group size to 4-shot group size for every group. So let's assume that an "average" 5-shot group is 1 inch. I have a 5-shot group with is 6 inches, but has 4 shots that are in a group of 0.75 inches. The 5-shot to 4-shot ration would put the is group well beyond the clip level of the largest 5% of simulated groups and thus the 5-shot group could be viewed as "abnormal". That is the rub with statistics. I can calculate the exact probability of flipping a penny and getting a hundreds tails in a row even though such a result is practically impossible. So if someone did flip a hundred tails in a row, you would have to have to be very skeptical that the flips were fair. You can't use statistics to "prove" that the flips were unfair, you can only infer that the result was highly unusual at some confidence level. User: Herb 15 May 2015, 6:24 EST

Think of Danielson's method like an ANOVA problem. The ((difference in distance)/2)^2 between two shots in a group gives variance within a group. But each group of the 12 groups has it own center. There is also an "overall center" which is the center of the 12 centers. Thus there is also a variance between centers for the 12 groups. For just two shots per group the variance between group centers is very significant compared to the variance within a group.

It is an egregious error to think of Danielson's method as "efficient". Yes two-shot group size for just one group is 100% efficient. The inefficiency is the result of having multiple two-shot groups. 100% efficiency would be to use the 24 shots in a single Rayleigh distribution analysis which would have one center for all 24 shots and 24 radii from that lone center.

The situation where group size using 2-shot groups would be most appropriate would be when testing for a high percentage of "outliers". Here I'm using "outlier" to mean a real aberration, not just a shot from the distribution with a wide spread. Otherwise 5-7 shots per group would be more statistically efficient.

Herb (talk) 15:59, 24 May 2015 (EDT)

When I did the math I found that, despite the fact that breaking the sample into groups of two shots throws away information about the combined center, the Rayleigh estimate and confidence intervals are identical. I believe this is easy enough to show with the Rayleigh MLE formula. I'll put this on my queue to prove. David (talk) 18:05, 24 May 2015 (EDT)
No way that is true. I'm at a loss as to how to try to explain the situation better.
Herb (talk) 21:32, 24 May 2015 (EDT)
You should smell a rat in all of this. Look at page 4 of pdf file for figure "Estimation of Efficiency By Group Size" (lower left of page 44 of text). This assumes no flyers. At first as there are more shots in a group the efficiency increases because the group center becomes a better estimate of the population center. However since the group size "only uses" two shots, as the group gets more shots the efficiency goes down. ("Only uses" isn't absolutely true but overall trend is absolutely correct.) The efficiency of 5-7 shot groups is about same and "most efficient" (again neglecting flyers which is a huge assumption).
Herb (talk) 22:59, 24 May 2015 (EDT)
I had a chance to review this and you're right: I was making too strong a claim. What I had found is an isomorphism between the T-test and the Rayleigh confidence measure for hypothesis testing. So no: you can't get an efficient estimate of precision using 2-shot groups. And, obviously even for hypothesis testing you will be able to reject hypotheses with greater certainty if you sample from larger groups. I'll remove that section from the main page! David (talk) 13:33, 25 May 2015 (EDT)

What is a gun?

Herb changed "gun" to "projectile weapons firing a single projectile on a vertical target within the line of sight. A good example would be target shooting with a rifle or pistol. Such weapons as shotguns, mortars, and ballistic missiles would have some similar characteristics, but also have factors that are neglected in the discussions and measurements." I could see changing it to something like "ballistic weapons," but can you explain the other restrictions and caveats? For example, I'm thinking:

  1. Why is it only applicable to single-projectile weapons? I suspect everything applies to multi-projectile firearms without qualification. David (talk) 17:43, 27 May 2015 (EDT)
Rifle, pistol, or crossbow as opposed to shotgun or cluster bomb. For bird hunting with a shotgun you're interested in the biggest hole is in the pattern of shot.
Herb (talk) 02:38, 28 May 2015 (EDT)
  1. Why does the target have to be vertical? In the symmetric case it has to be orthogonal to the projectile's trajectory (or else its ellipticity has to exactly match the deflection), but in all other cases even that is not necessary. David (talk) 17:43, 27 May 2015 (EDT)
   I just would like to change all the (X,Y) coordinate stuff to (H,V) so that there aren't two coordinate systems. Also trying to avoid all stuff about how to adjust sights for shooting uphill or downhill, or shooting over a hill onto a slope.
   In terms of external ballistics it would be fascinating to compare horizontal shots to vertical shots. A vertical range is hard to find. :-)
Herb (talk) 02:38, 28 May 2015 (EDT)
The key trait here is really perpendicular to the line of fire. Obviously if the target were at an angle to the line of fire then elliptical groups would result.
Herb (talk) 12:33, 28 May 2015 (EDT)
  1. Why does the target have to be line of sight? If I shoot over or through a visual obstacle my samples are just as valid as if I can see the point of impact from my firing position. David (talk) 17:43, 27 May 2015 (EDT)
For long range weapons there are all sorts of corrections for air density, rotation of earth and so on due to long time of flight. See:
Herb (talk) 02:38, 28 May 2015 (EDT)
The gist is to make wiki about target analysis, not sight adjustment. In reality if you shoot at different distances, then you must adjust POA for distance. So should have something about a fixed distance. Including external ballistics just opens up a massive number of factors.
Herb (talk) 10:16, 28 May 2015 (EDT)
  1. What additional factors do mortars and ballistic missiles have that need to be incorporated in the models here for the analyses to apply? David (talk) 17:43, 27 May 2015 (EDT)
There are mathematical models that figure how close a bomb has to hit to be considered a kill. So impact isn't a point but an area with a large radius. In such models the probability of a "kill" depends on how close. So part of the problem is to figure how many rounds for a kill.
Herb (talk) 02:38, 28 May 2015 (EDT)
I don't find these to be very convincing exceptions for the purposes of this site. Yes, there are many other good questions concerning the performance of a shotgun or explosive, and there are branches of ballistics not only for external and terminal effects, but also for longer-range and high-trajectory effects (and probably underwater, exo-atmospheric, etc.). But if you want to talk about ballistic precision then everything on this site is applicable to that question. I.e., it doesn't matter whether you're lobbing ICBMs or flinging an atlatl: if you take shots on a two-dimensional target and try to hold relevant variables constant then this site talks about how you can characterize and predict your precision from sample impacts. Or am I missing something? David (talk) 11:45, 28 May 2015 (EDT)
I really don't think that you want to expand the wiki to include external ballistics. The wiki should concentrate on doing one thing well. Analyzing the holes in a single target. There is absolutely no point in including a massive topic for which the wiki has no hope of being authoritative.
Herb (talk) 12:33, 28 May 2015 (EDT)
I think we're having a violent agreement. I was trying to be cutesy and add restrictions to exclude external ballistics to avoid the camel's nose under the tent. The better way would be just to explicitly state: The wiki assumes an extremely simply notion of external ballistics. On average the POI will be the POA.
Herb (talk) 12:45, 28 May 2015 (EDT)
Something like that, but not exactly that, because the fact that you can't know POI = POA is a significant factor addressed here and here. We acknowledge that external ballistics exists in several places (as we do tacitly with internal ballistics) without bringing them in scope. So would it be better to say something like, "We do not address the nuances of internal, external, or terminal ballistics. Rather, we address the question of what the marks of ballistic projectiles on a two-dimensional target can tell us about the precision of the system that produces them." David (talk) 16:49, 28 May 2015 (EDT)
Perhaps it would be better to just expand the mention of these variations in What_is_Precision?#Precision_in_Shooting? David (talk) 11:50, 28 May 2015 (EDT)

I took another swing at this...

Note that a closed form does not mean formula. For example if you want to find out where the 56.674% percentile is of the normal distribution then there is no formula to do this to any desired precision. It is simple and quick however for a computer function to use successive approximations using the CDF to determine the value to any desired precision.

I also cut the link to "Close form" wiki page for the moment. That needs to be redone as "The Rayleigh Distribution" which is one of four special cases for the general bivariant normal.

Herb (talk) 21:05, 29 May 2015 (EDT)

I'm seeing some good improvements. I don't see a broken link, but in case there was any question: a rule for a live site is to not break links; only redirect when the new content or version is ready. (And for major refactorings I think the approach of building new pages is preferable to rebuilding in place when not too inconvenient.)
ugh. It wasn't the "closed form" link, but the "sample size" link that I took out. I took it out for the moment. I think some sort of "Sample Size Review" page is needed. Of course the "real" sample size (number of shots) needed depends on measurement type and how many measurements. (eg thirty shots as 5 6-shot groups)
Herb (talk) 22:16, 29 May 2015 (EDT)
Also you're tripping over yourself with the "closed form" semantics. I agree that it's a term of art, but for people who don't know what it means it's better to just leave it unless you can find something roughly as concise but more broadly illuminating. I think "formulaic" is an acceptable description of the characteristic that separates the approaches. Maybe you prefer "formal." "Amenable to deterministic, polynomial-time expansion..." is definitely not, even if it were correct. David (talk) 21:28, 29 May 2015 (EDT)
I agree that introducing deterministic/polynomial time is overly geeky. How about "computer function" as opposed to "formula"? I think that would split the difference between us nicely here.
Herb (talk) 22:06, 29 May 2015 (EDT)
I still don't understand the problem. A formal (or closed-form) solution is amenable to all sorts of analysis, symbolic manipulation, and proof that Monte Carlo solutions are not. As soon as you have a PDF, moment generating functions, MLEs, etc., you're in a rich world built upon centuries of such work. You have a fundamental understanding that the brute-force of Monte Carlo simulations can't provide. Granted, sometimes to "get a number out" of formal equations one resorts to numerical methods that are most convenient on a computer. I guess I'm not certain of your objection: Do you not see a stark difference between Monte Carlo and formal? Or are you just struggling with the semantics of of how to discriminate between the two? David (talk) 23:34, 29 May 2015 (EDT)
It is a semantics problem to me. Given that you want to find out how many \(\sigma\)'s above the mean the 56.674% percentile is of the normal distribution, then there is no formula to do this to any desired precision. It is simple and quick however for a computer function using successive approximations on the CDF of the normal distribution to quickly determine the value to any desired precision. Since we can get "quickly" answer by successive iteration there is a closed form for the problem even though there is no equation which will give you the answer. For Monte Carlo there is an algorithm, but not really a computer function that will do the calculation. (Perhaps splitting hairs on how a computer function is defined here.) Requiring only a formula for PDF or CDF to have a closed form is too restrictive.
Herb (talk) 03:22, 30 May 2015 (EDT)
One of the notable features of this site's content is that it contains the most substantial and extensive closed-form/formal analysis of the problem yet seen. Visitors who don't care about that can stick to the pages that show empirical approaches to the problem. Noting that "equations don't directly produce numeric solutions" is a general observation that is out of scope. E.g., we're not offering a general primer on Monte Carlo analysis, either. Although I guess it wouldn't hurt to provide links to good primers if you want.
That said, you can provide the numeric functions where a typical reader might appreciate them. For example, I provide them for two of the correction factors since those aren't readily available the way, say, spreadsheet formulas for the normal distribution are. I also note a means of calculating the Rice CDF because that is so esoteric that it might be hard for someone to find. But if somebody is reading through the Closed Form Precision pages we have to assume some understanding of – and interest in – formal math. David (talk) 10:41, 30 May 2015 (EDT)
To avoid the math/computer science geekiness, how about the phrase "explicit solution"?
 :-( The wiki page on "Closed Form" isn't about closed form per se, it is about using the Rayleigh Distribution to solve explicit solutions for some of the measures.
Well right now the only closed-form (or however you describe it) model we have is that provided by the Rayleigh distribution. Hopefully there exist useful formal descriptions for the more general assumption cases as well and we're just waiting for somebody to work them out. In the meantime we show some explicit but "not formal" solutions for the other measures and assumptions. I guess we just need a roadmap with something like: "Mathematicians start here, everyone else start there." The exact words don't matter so much as providing a clear and accurate map for visitors. Unfortunately this thread shows that words to explain this dichotomy are not easy. Maybe a better structure is, "Formal math here, applied math there, and if you just want the answers skip right to the end." I sort of tried to set it up that way with the giving useful answers and linking back to the supporting math for those who might be interested. David (talk) 20:36, 30 May 2015 (EDT)