Statistical Methods for Comparability Studies



Fig. 26.1
Decision tree—defining critical quality attributes comparability criteria



The first step in comparability determination is defining the target CQAs (T). To assess manufacturing capability (ΔM), one must have two inputs, analytical variability (ΔA) and process variability (ΔP). Based on the knowledge of structure activity relationships, existing clinical and pre-clinical experience, and manufacturing capability, the comparability acceptance criteria (ΔC) should be proposed for each CQA.

To ensure consistency in comparability study practice, a statistics-based systematic approach is recommended. The comparison of the results to the acceptance criteria allows for an objective assessment of whether or not the two products are comparable. This section outlines the general statistical methods to be used in evaluating comparability. Assessing comparability is NOT just about meeting the predefined specification/criterion where the specification/criterion of the proposed post-change product should not be wider than the variability of pre-change product. Developing an analytical method in differentiating meaningful difference is critical and important.

With those considerations, three statistical models are proposed below: (1) a non-paired head-to-head comparison where the two substances/products can be tested within a single assay run, such as sequential measurements by HPLC, (2) a paired head-to-head analytical comparability where pre-change and proposed post-change product materials can be tested simultaneously, such as the measurements in a potency assay, and (3) one-sided testing of proposed post-change product materials against a pre-existing database for the pre-change product. It is recommended that the actual study design and data analysis should be carried out with the close collaboration of a statistician.


26.2.1.1 Unpaired Quality Attributes


Consider the most commonly seen data structure, for example, HPLC generated data where non-paired observations are produced from both the proposed post-change and the pre-change product. The data structure is



$$ \begin{array}{l}{x}_{11},\cdots, {x}_{1{n}_x}\kern4.25em {y}_{11},\cdots, {y}_{1{n}_y}\\ {x}_{21},\cdots, {x}_{2{n}_x}\kern4.25em {y}_{21},\cdots, {y}_{2{n}_y}\\ \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \\ {x}_{m_x1},\cdots, {x}_{m_x{n}_x}\kern4em {y}_{m_y1},\cdots, {y}_{m_yn{}_y}\end{array} $$
where 
$$ {x}_{ij}={x}_i^0+{\delta}_{ij} $$
, 
$$ {y}_{kj}={y}_k^0+{\varepsilon}_{kj} $$
, 
$$ {\delta}_{ij}\sim N\left(0,{\sigma}_{\delta}^2\right) $$
, 
$$ {\varepsilon}_{kj}\sim N\left(0,{\sigma}_{\varepsilon}^2\right) $$
, 
$$ {x}_i^0\sim N\left({\mu}_x,{\sigma}_x^2\right) $$
, 
$$ {y}_k^0\sim N\left({\mu}_y,{\sigma}_y^2\right) $$
. Let x ij be the observed j th critical quality attribute measurement for the pre-change (reference) product for lot (batch, or run) i, where 
$$ i=1,\cdots, {m}_x $$
, 
$$ j=1,\cdots, {n}_x $$
, n x is the number of measurements for each reference lot, and m x is the number of reference lots used. Let y kj be the observed j th critical quality attribute measurement for the proposed post-change product (test) for lot k, where 
$$ k=1,\cdots, {m}_y $$
, 
$$ j=1,\cdots, {n}_y $$
, n y is the number of measurements for each test product lot, and m y is the number of test product lots used. The σ x 2 and σ δ 2 are the process variability and the assay variability for the reference product, respectively. Similarly, the σ y 2 and σ ε 2 are the process variability and the assay variability for the test product, respectively. The goal is to compare the two products (test vs reference) and see how similar they are.

For this type of data, formal statistical approaches are traditionally used such as the two one-sided test (TOST) to assess equivalence of means with the pre-specified acceptance criteria (Chatfield et al. 2011). However, the assessment of how similar the test and the reference product are to each other is not necessarily straightforward since how similar is similar enough is not well defined and scientific/clinical judgment may not be easily available. Thus, Liao and Darken (2013) proposed a method using a tolerance interval (TI) and a plausibility interval (PI) to define the comparability criteria. The basic idea is described here. Consider a hypothesized study where the test is also the reference. Since the reference product is an established or approved product, therefore the test (here it is the reference also in this hypothesized study) should be almost always comparable to the reference (itself). Any observed difference is due to chance and the difference is clinically negligible. Thus if any difference is within the reference variability, it is reasonable to conclude comparability between the test and the reference. Toward this end, an interval called the Plausibility Interval (PI) was proposed to quantify this reference variability. The assay + process PI for the difference between the reference and itself is defined as follows:



$$ \left(-k\sqrt{2\left({\sigma}_x^2+{\sigma}_{\delta}^2\right)},+k\sqrt{2\left({\sigma}_x^2+{\sigma}_{\delta}^2\right)}\right) $$

(26.1)
where the critical value k is a factor to control the sponsor’s tolerance, the σ x 2 and σ δ 2 are the process variability and the assay variability for the reference product, respectively, which can be estimated from the commonly used variance decomposition method. Note that the interval in Eq. (26.1) is defined for the difference of test and reference. The PI defines the acceptable range for the quality attribute difference between the test and the reference to fall within. Any difference within this PI should be considered practically acceptable, and thus it can serve as a goalpost for judging comparability. The concept of “goal posts,” whereby the attributes of the post-change product fall within the demonstrated variation of the reference over its lifetime, is becoming a widely accepted approach for creation of a “highly similar” comparable product (McCamish and Woollett 2013).

An at least p-content with confidence 
$$ 1-\alpha $$
tolerance interval for the difference between the test and the reference is constructed so that



$$ \Pr \left[ \Pr \left(L<Y-X<U\right)\ge p\right]\ge 1-\alpha $$

(26.2)
where L and U are statistics calculated from the data, X(Y) is the quality attribute of the reference (test). In most practical applications, 
$$ 1-\alpha $$
and p are typically chosen from the set of values {0.90, 0.95, 0.99} (Krishnamoorthy and Mathew 2009). The choice of the combination can be used to control the risk for both consumer and the sponsor. The interval (L, U) is usually termed as the approximate 
$$ \left[100\times \left(1-\alpha \right)\%\right]/\left[100\times p\%\right] $$
TI. In order to construct the approximate 
$$ \left[100\times \left(1-\alpha \right)\%\right]/\left[100\times p\%\right] $$
TI, the method based on the Satterthwaite approximation recommended by Krishnamoorthy et al. (2011) is used, where the lower bound L and the upper bound U are defined as follows.



$$ \begin{aligned} \mathrm{L}&={\widehat{\mu}}_y-{\widehat{\mu}}_x-{z}_{\left(1+p\right)/2}\sqrt{\frac{\widehat{f}\left({a}_x{s}_x^2+{a}_y{s}_y^2\right)}{\chi_{\widehat{f},1-\alpha}^2}},\ \mathrm{and}\\ \mathrm{U}&={\widehat{\mu}}_y-{\widehat{\mu}}_x+{z}_{\left(1+p\right)/2}\sqrt{\frac{\widehat{f}\left({a}_x{s}_x^2+{a}_y{s}_y^2\right)}{\chi_{\widehat{f},1-\alpha}^2}}\end{aligned} $$
where 
$$ {z}_{\left(1+p\right)/2} $$
is the 
$$ 100\times \left(1+p\right)/{2}^{th} $$
percentile of a normal distribution, 
$$ {\chi}_{f,1-\alpha}^2 $$
is the 
$$ 100\times {\left(1-\alpha \right)}^{th} $$
percentile of a chi-square distribution with df = f, s x 2 and s y 2 are the estimates for the total variance of the reference and the test product, respectively and can be estimated from the commonly used variance decomposition method, 
$$ {n}_1={m}_x\times {n}_x $$
, 
$$ {n}_2={m}_y\times {n}_y $$
, 
$$ {a}_x=1+\frac{1}{n_1} $$
, 
$$ {a}_y=1+\frac{1}{n_2} $$
and 
$$ \widehat{f}=\frac{{\left({a}_x{s}_x^2+{a}_y{s}_y^2\right)}^2}{a_x^2{s}_x^4/\left({n}_1-1\right)+{a}_y^2{s}_y^4/\left({n}_2-1\right)} $$
, 
$$ {\widehat{\mu}}_x $$
and 
$$ {\widehat{\mu}}_y $$
can be estimated using the weighted average.

The test and the reference are claimed comparable if: (1) the approximate 
$$ \left[100\times \left(1-\alpha \right)\%\right]/\left[100\times p\%\right] $$
tolerance interval for the difference between the test and the reference defined in Eq. (26.2) is within the plausibility interval defined in Eqs. (26.1); and (2) if the estimated mean ratio is within a specified boundary, for example, [0.8, 1.25].

Note that the tolerance interval is used here instead of the commonly used confidence interval. A confidence interval’s width is due entirely to sampling error but in contrast, the width of a tolerance interval is due to both sampling error and variance in the population. As the sample size approaches the entire population, the width of the confidence interval approaches zero but in contrast, the width of the confidence interval gives the estimated percentiles approaching the true population percentiles. When comparing to the reference variability scaled range, the tolerance interval is a more appropriate choice.

The first condition in the comparability acceptance criteria is to control false failing comparability claims and the second condition is to control false passing comparability claims. It is a capability based approach which could falsely pass a large mean difference for the test product due to a large reference product variability. The additional point-estimate constraint in the second condition eliminates the potential that a test product with a large mean difference would enter the market (Haidar et al. 2007, 2008).

In contrast to the traditional equivalence approach for just testing the equivalence of means, one of the advantages of the new approach is that it also considers the variability instead of just the mean difference. The performance and feasibility of this approach was demonstrated through simulation studies and an example, and details can be found in Liao and Darken (2013). The simulation results showed that the between batch variability of the reference product plays a very important role in passing comparability. The number of batches that should be used in the comparability study heavily depends on the between batch variability. Larger number of batches is needed when the between batch variability is higher. At least two different batches should always be used in the head-to-head comparison. Using simulation and real data, Liao and Darken (2013) recommended using k = 2.5 or 3 in constructing the PI in Eq. (26.1). However, the actual k value can be chosen case-by-case depending on the nature of the reference product and still be consistent with health authorities’ requirements.


26.2.1.2 Paired Quality Attributes


Consider the paired data structure, a comparison of the relative potency of the proposed post-change product (test) with this potency of reference standard. As recommended in ICH Q6B, an in-house reference standard(s) should always be qualified and used for control of the manufacturing process and product. Thus, the relative potency data for both the pre-change product (reference) and the post-change product (test) are a relative potency of the reference product to the in-house standard and the relative potency of the test product to the in-house standard. The data structure is



$$ \begin{array}{l}{x}_{11},\cdots, {x}_{1n1}\leftrightarrow {y}_{11},\cdots, {y}_{1n1}\\ {x}_{21},\cdots, {x}_{2n2}\leftrightarrow {y}_{21},\cdots, {y}_{2n2}\\ \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \\ {x}_{m1},\cdots, {x}_{mnm}\leftrightarrow {y}_{m1},\cdots, {y}_{mnm}\end{array} $$
where 
$$ {x}_{ij}={x}_i^0+{\delta}_{ij} $$
, 
$$ {y}_{ij}={y}_i^0+{\varepsilon}_{ij} $$
, 
$$ {\delta}_{ij}\sim N\left(0,{\sigma}_{\delta}^2\right) $$
, 
$$ {\varepsilon}_{ij}\sim N\left(0,{\sigma}_{\varepsilon}^2\right) $$
, 
$$ {x}_i^0\sim N\left({\mu}_x,{\sigma}_x^2\right) $$
, 
$$ {y}_i^0\sim N\left({\mu}_y,{\sigma}_y^2\right) $$
, x ij is the observed j th critical quality attribute measurement for the reference product for lot (batch, or run) i, 
$$ i=1,\cdots, m $$
, 
$$ j=1,\cdots, {n}_i $$
, m is the number of reference lots, and n i is the number of measurements for reference lot i, y ij is the observed j th critical quality attribute measurement for the test product for lot i, 
$$ i=1,\cdots, m $$
, 
$$ j=1,\cdots, {n}_i $$
, m is the number of test product lot, and n i is the number of measurements for the test product lot i. The σ x 2 and σ δ 2 are the process variability and the assay variability for the reference product, respectively, and σ y 2 and σ ε 2 are the process variability and the assay variability for the test product, respectively. Thus, the observed data are (i, x ij , y ij ), where 
$$ i=1,\cdots, m $$
, 
$$ j=1,\cdots, {n}_i $$
. For lot i, it is reasonable to assume that there exists a linear relationship 
$$ {y}_i^0={\alpha}_i+{\beta}_i{x}_i^0 $$
, 
$$ i=1,\cdots, m $$
. The goal is to compare the two products (test vs reference) and see how similar they are.

Given 
$$ N={\displaystyle \sum_{i=1}^m{n}_i} $$
paired observations (i, x ij , y ij ), where x ij is the observation of independent variable from the reference product and y ij is the observation of the response from the test product, 
$$ i=1,\dots m,j=1,\dots, {n}_i $$
, and m is the total number of lots. In the current case with the paired observations, consider a linear structural measurement error model (Fuller 1987) for lot (batch, or run) i as follows.



$$ {y}_{ij}={y}_i^0+{\varepsilon}_{ij}={\alpha}_i+{\beta}_i{x}_i^0+{\varepsilon}_{ij}\vspace*{-5pt} $$

(26.3)




$$ {x}_{ij}={x}_i^0+{\delta}_{ij} $$

(26.4)
where ε ij and δ ij are independent with normal distributions N(0, σ ε 2) and N(0, σ δ 2), respectively. Note that both x i 0 and y i 0 are still a random variable in a structural measurement error model (Fuller 1987). In order to avoid the unidentifiability problem, the reliability ratio 
$$ \lambda =\frac{\sigma_{\delta}^2}{\sigma_{\varepsilon}^2} $$
is assumed fixed and known. In the comparability study setting, it is reasonable to assume the reliability ratio λ = 1. The parameters in each source are a sample from a bivariate normal distribution such as



$$ {\left({\alpha}_i,{\beta}_i\right)}^T={\left(\alpha, \beta \right)}^T+{\varDelta}_i, $$

(26.5)
where Δ i is a bivariate normal distribution with mean 0 and covariance matrix
$$ \sum =\left(\begin{array}{cc} {\sigma}_1^2 & \rho {\sigma}_1{\sigma}_2 \\ \rho {\sigma}_1{\sigma}_2 & {\sigma}_2^2 \end{array}\right) $$
. Combining Eqs. (26.3), (26.4) and (26.5) results in a linear mixed-effects structural measurement error model.

Following the same ideas laid out for analyzing the unpaired critical quality attributes, a 
$$ \left[100\times \left(1-\alpha \right)\%\right]/\left[100\times p\%\right] $$
tolerance interval for the difference of the test and the reference and a plausibility interval will be constructed. Thus, the assay + process PI for the difference between the reference against the reference itself is defined as follows:



$$ \left(-k\sqrt{2\left({\sigma}_x^2+{\sigma}_{\delta}^2\right)},+k\sqrt{2\left({\sigma}_x^2+{\sigma}_{\delta}^2\right)}\right) $$

(26.6)
where the critical value k is a factor to control the sponsor’s tolerance, the σ x 2 and σ δ 2 are the process variability and the assay variability for the reference product, respectively. The PI defines the acceptable range for the difference between the test and the reference to fall within and any difference within this PI should be considered practically acceptable. Following the same recommendation for the non-paired case, the k = 2.5 or 3 is recommended for constructing the PI in Eq. (26.6). However, the actual k value can be chosen case-by-case depending on the nature of the reference product and still be in alignment with health authorities’ requirements.
< div class='tao-gold-member'>

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jul 22, 2016 | Posted by in PHARMACY | Comments Off on Statistical Methods for Comparability Studies

Full access? Get Clinical Tree

Get Clinical Tree app for offline access