Let me give you a different example. Let's assume that when you flip a coin, it always lands as either heads or tails (no landing on the edge or anything lke that). Let's also assume that there is an equal chance of each outcome. This means that there is a 50% chance of heads, and a 50% chance of tails, or expressed another way, 0.5 probability of heads, and 0.5 probability of tails.
Now, while the chance of getting tails twice in a row is 0.5 * 0.5, or 0.25, this
does not mean that the chances are altered on the second coin toss. If I flip heads twenty times in a row, then flip again, the chances are
still fifty-fifty on that twenty-first flip. The same is true for your issue. You can have a 6% chance of a 'true' event, but that does not mean you are guaranteed to have 6% of all events being true. You might have 100 events, and get 6 true events. You might get 1 true event, or none at all. There's even a slim chance that every event will come out true.
The larger your sample size (ie, the more events you have), the more likely you will get 6% true events (or closer to it). However, if you have a small
sample size, you will see skewed chances. Imagine you have only one event, and it is either true or false. Obviously, these are mutually exclusive; it cannot be
both true
and false. If it comes up as true, then 100% of your samples (your sample size being one) were true. The inverse is the case if it comes up as false.
See where this is going? With any statistical analysis, your figures are
estimates unless you examine
every single possible event (this is the distinction between the samples and the population). Let's say I wanted to find out what percentage of people in the world are female. The only way to know for certain is to examine every single person in the world. This is obviously not practical, so I'd instead take a sampling of the world's population, and base my figures off that. If I examined 1000 people, and 507 of them were female, I'd say that 50.7 of the population is female. However, depending on various factors, I could be way off. Say I took a sample of 1000 people, and they were all female. Would that mean everyone in the world is female? Clearly, that's not the case. Would this be explained when you found out I drew my samples from all-girls schools?
As you can see, it's not always so simple. When you're measuring things that exist in the real world, you need to be aware of bias that is introduced, intentionally or otherwise. When you're dealing with a mechanism to generate simulated data, like in your program, you need to be aware that you might not get the results you expect. For example, many pseudo-random number generators are abused by programmers. Say I have a function that returns a random integer in the range 0 to 15, inclusive. I want a number in the 0 to 4 range (five different possibilities). The most common solution is to use modulus to reduce the number:
int x = randomIntZeroToFifteen();
int y = x % 5;
// x y
// 0 0
// 1 1
// 2 2
// 3 3
// 4 4
// 5 0
// 6 1
// 7 2
// 8 3
// 9 4
// 10 0
// 11 1
// 12 2
// 13 3
// 14 4
// 15 0
See how the number 0 is generated four times, but each other number onlyoccurs three times? This is an unequal distribution, which probably isn't what I want.