Can I still use the 3 sigma approach for the demonstration of a successful scale-up?
For a long time, our industry considered 3 sigma testing to be state of the art to compare similarities. This test suggests similarity when all studied scale-down models are within 3 standard deviations of the large scale/manufacturing runs. This procedure has several drawbacks since it does not aim to identify differences in the mean of the scales, and it rewards small sample sizes, where the chance of passing the test is increased. According to our consulting experience and interactions with regulators, it is highly controversial whether the use of the 3 sigma method is appropriate to demonstrate similarity. The new EMA paper on comparability testing sheds some more light on this topic.
Compared to 3 sigma testing, inferential statistics (e.g. TOST testing), provide information on the risk associated with the decision-making about comparability. Therefore, it is favorable over the approach to compare a single observed value against an acceptance criterion (e.g. 3 sigma).
“Similarity criteria solely based on plans to compare single observations (e.g. of test batches) to a pre-defined acceptance range (based on reference data) are usually unsuitable to allow for reliable inference to the underlying general manufacturing process” – EMA reflection paper on comparability analysis – draft.