Confidence Intervals
Outliers are atypical, infrequent observations: data points which do not appear to follow
the distribution of the rest of the sample. These may represent consistent but rare traits, or
be the result of measurement errors or other anomalies which should not be modeled.
Note that this (or any other) description of outliers only applies to data that is deemed to
be a statistically significant sample of measurements. Without a statistically significant
sample, there is no generally acceptable approach to determining the difference between
an outlier and a representative measurement.
Using this description, results graphs can be used to determine evidence of outliers --
occasional data points that just don't seem to belong. A reasonable approach to
determining if any apparent outliers are truly atypical and infrequent is to re-execute the
tests and then compare the results to the first set. If the majority of the measurements are
the same, except for the potential outliers, the results are likely to contain genuine outliers
that can be disregarded. However, if the results show similar potential outliers, these are
probably valid measurements that deserve consideration.
After identifying that a dataset appears to contain outliers, the next question is, how many
outliers can be dismissed as "atypical infrequent observations?"
There is no set number of outliers that can be unilaterally dismissed, but rather a
maximum percentage of the total number of observations. Applying the spirit of the two
definitions above, a reasonable conclusion would be that up to 1 percent of the total
values for a particular measurement that are outside of three standard deviations from the
mean are significantly atypical and infrequent enough to be considered outliers.
In summary, in practice for commercially driven software development, it is generally
acceptable to say that values representing less than 1 percent of all the measurements for
a particular item that are at least three standard deviations off the mean are candidates for
omission in results analysis if (and only if) identical values are not found in previous or
subsequent tests. To express the same concept in a more colloquial way: obviously rare
and strange data points that can't immediately be explained, account for a very small part
of the results, and are not identical to any results from other tests are probably outliers.
A note of caution: identifying a data point as an outlier and excluding it from results
summaries does not imply ignoring the data point. Excluded outliers should be tracked in
some manner appropriate to the project context in order to determine, as more tests are
conducted, if a pattern of concern is identified in what by all indications are outliers for
individual tests.
Confidence Intervals
Because determining levels of confidence in data is even more complex and time-
consuming than determining statistical significance or the existence of outliers, it is
extremely rare to make such a determination during commercial software projects. A
confidence interval for a specific statistic is the range of values around the statistic where
the `true' statistic is likely to be located within a given level of certainty.