Nonclinical Safety Assessment: An Introduction for Statisticians



Fig. 9.1
Nonclinical safety studies which may be conducted for a typical drug project



Even before any ‘wet work’ is conducted to support the safety aspects of a drug project, in silico evaluations may be performed e.g. to identify key moieties within a molecule that are associated with toxicity. These in silico predictions, however, are only ever as good as the data that supports them from previous studies. Other early safety work includes in vitro safety screening studies, such as those used in cardiovascular risk screening (see Chap. 8). Such studies provide an early indication of which substances have the potential to cause specific types of adverse effects or harm to humans. By their nature, these studies flag up a potential hazard rather than quantifying the risk to humans, as the translation from in vitro screen to clinical effect may not be fully established. However, the early warnings allow scientists to either control or eliminate the hazard (e.g. by moving to another chemical series), or to conduct additional studies for further information. These additional studies may be a well-defined cascade, or a set of specific investigative studies.

Investigative toxicology studies may be in vitro or in vivo and can be initiated at any point during the drug discovery and development process in response to the identification of a potential safety concern. Their purpose is to better understand the hazard, and the likelihood that this will become an unacceptable clinical risk. Since these studies are by definition bespoke, that is, tailored to the particular question under investigation, many of the general principles mentioned during the remainder of this chapter will be relevant for the statistician supporting such studies.

Whilst early studies tend to be high or medium throughput so that many substances can be tested (low cost), most safety studies of a more standard nature are run later when the number of substances or series being considered for any drug project is greatly reduced (high cost). The statistician should be aware of the International Conference on Harmonisation (ICH) guidelines which cover the remaining types of safety studies shown in Fig. 9.1, with specific guidance for biotechnological products. The ICH meets at least twice a year and brings together the regulatory authorities of Europe, Japan and the USA with the aim of harmonising pharmaceutical testing requirements for all pharmaceutical agents under four categories: Safety, Quality, Efficacy and Multi-disciplinary.

The following sections provide a brief introduction to each family of studies covered by ICH Safety Guidelines S1 through S8. We summarise briefly the purpose of the studies and highlight for the statistician some relevant points and sources of further information. Readers should, in addition, be aware of the multi-disciplinary guideline ICH M3(R2) (ICH 2009). This represents the consensus that exists regarding the type and duration of nonclinical safety studies and their timing to support the conduct of human clinical trials and marketing authorisation for pharmaceuticals. This includes the duration of repeated dose studies, specialist toxicology areas e.g. phototoxicity, combination studies and the use of biomarkers in exploratory clinical studies. For a good overall introduction to statistical methods used in toxicology studies see (Jarvis et~al. 2011).



9.1.2.2 Carcinogenicity Studies


The aim of a carcinogenicity study is to identify a test substance’s tumorigenic potential in animals and to assess the relevant risk in humans. Typically 2 year carcinogenicity studies will support the registration of medicines with an intended patient use of more than 6 months. A carcinogenicity study is usually required before an application can be made for marketing approval of a pharmaceutical, but not necessarily prior to the conduct of large scale clinical trials as indicated in ICH MR3(R2) (ICH 2009). ICH S1A (ICH 1995), S1B (ICH 1997) and S1C(R2) (ICH 2008) provide guidelines for rodent carcinogenicity studies, but at the time of writing a change to the S1 guidelines is proposed: to introduce a more comprehensive and integrated approach to addressing the risk of human carcinogenicity of pharmaceuticals, and to clarify and update the criteria for deciding whether the conduct of a 2-year rodent carcinogenicity study of a given pharmaceutical would add value to this risk assessment. Chapter 12 provides results of recent research exploring some aspects of the experimental design and analysis of carcinogenicity studies.


9.1.2.3 Genotoxicity Studies


ICH S2(R1) (ICH 2011a) describes the standard studies used in genetic toxicology and provides guidance on interpretation of results. In vitro and in vivo tests are used to detect substances that induce genetic damage to DNA, the goal being to characterise the risk for carcinogenic effects originating from changes in genetic material. These studies are typically only needed for small molecules. The regulatory genetic toxicology screening cascade will generally include: a bacterial gene mutation assay (e.g. Ames), an in vitro mammalian cell assay for gene mutation and/or chromosome aberrations (e.g. in vitro micronucleus assay, in vitro mouse lymphoma Tk gene mutation assay), and an in vivo assay for chromosomal effects (e.g. rodent micronucleus). For an introduction to the study design and statistical power considerations for the widely used rodent micronucleus test, statisticians are referred to (Hayes et~al. 2009).


9.1.2.4 Toxicokinetics and Pharmacokinetics


Pharmacokinetics (PK) is the study of the fate of substances within the body over time and of how the body absorbs, distributes, metabolises, and eliminates a substance. PK parameters (e.g. clearance) are calculated using the concentrations of substances measured in biological matrices, usually plasma. Toxicokinetics (TK) is the description of the systemic exposure of a substance in toxicity studies using PK parameters. The overall aim of TK studies is to relate the exposure achieved in animals to the substance dose level and the time course of a toxicity study. TK data also play a role in the clinical arena, assisting the setting of limits for human exposure and the calculation of safety margins. ICH S3A (ICH 1994a) and S3B (ICH 1994b) provide guidance on TK and PK studies. Chapter 11 contains further information on PK measurement.

Toxicity studies which may be usefully supported by toxicokinetic information include studies of single and repeated dose toxicity, reproductive toxicity, genotoxicity, carcinogenicity and safety pharmacology. TK data may be obtained from all animals on a toxicity study, from representative subgroups, from satellite groups or in separate studies. A separate assessment of the effect of repeated dosing on the accumulation of the substance and/or metabolites within tissues may be needed in some situations.

Statisticians can provide important input, for example in advising when to transform data and in ensuring that data summaries include appropriate estimates of variability: the inter-individual variation of kinetic parameters is often large. Small numbers of animals are usually involved in generating TK data and understanding individual animal responses may be of more value than a refined statistical analysis of group data.


9.1.2.5 Toxicity Testing


A key part of the nonclinical safety evaluation of a pharmaceutical product is repeated dose and chronic toxicity testing in rodents and non-rodents. In general, clinical development trials of up to 2 weeks in duration will be supported by repeated dose toxicity studies in 2 species (1 non-rodent) for a minimum duration of 2 weeks, conducted to Good Laboratory Practice standards (see Sect. 9.1.3). Clinical trials which last longer than this, up to 6 months, should be supported by repeated dose toxicology studies of at least equivalent duration. ICH S4 (ICH 1998) documents the consensus view on the required duration of nonclinical chronic toxicity testing required: a 6 month rodent and a 9 month non-rodent study will generally support dosing for longer than 6 months in clinical trials. The statistician needs to be aware that these studies have many endpoints and relatively small numbers of animals per group. Descriptive statistics, and expert scientific judgment, together with knowledge of “normal” ranges for parameters, are critical to interpretation of study outcomes; the statistical significance of effects should be interpreted in this broader context. For a good introduction to the design and size of general toxicity studies of various durations see (Sparrow et~al. 2011). Chapter 10 provides further information.


9.1.2.6 Reproductive Toxicology


The purpose of reproductive toxicology tests is to assess any impact of the substance tested on mammalian reproduction. This includes male and female fertility, embryo-foetal development and pre- and post-natal development. These animal studies are important in providing information in product labels for men or women wishing to have children and especially for pregnant and lactating women. Although literature, in vitro assays and studies in non-pregnant animals may provide early indications of the potential for reproductive toxicity during drug discovery, the main reproductive toxicity studies will usually come later, after 1 month general toxicity studies in rodents and non-rodents. Reproductive toxicology tests are described in ICH S5(R2) (ICH 2000a). A key statistical consideration for these studies is randomisation of animals to groups, with the pregnant female, the dam, often being the experimental unit, and taking into account the need to spread sibling animals or animals pregnant by the same stud male across groups. Care also needs to be taken with statistical analysis to allow for the dam/litter being the experimental unit. It should be noted that descriptive statistics are important, as is biological plausibility, when evaluating data from these studies, whilst inferential statistics may be used only as support for interpretation of results. Sizing of studies is also an important consideration, as the study must give rise to a sufficient number of litters; hence allowance must be made for some females failing to become pregnant. Other factors to consider in setting group size are the prevalence of events in control populations and the nature of the endpoint(s) being considered (continuous measure or otherwise). Chapter 10 provides further information.


9.1.2.7 Biotechnological Products


Biotechnology-derived pharmaceuticals (biopharmaceuticals) include products derived from characterised cells through the use of a variety of expression systems such as bacteria, yeast, insect, plant, and mammalian cells. These were initially developed in the early 1980s, with the first marketing authorisations granted later in the decade. The guidance for these products in ICH S6(R1) (ICH 2011b) is based on a critical review of experience with submission of applications for biopharmaceuticals. It sets out to provide general principles for designing scientifically acceptable nonclinical safety evaluation programs for these products. Whilst the details are specific to biotechnological products, the main study types or groupings covered are those which form the titles of the companion ICH Safety guidelines. The guidance highlights the criticality of group size in affecting ability to detect toxic events which may be associated with these products; hence a statistician is likely to have a key input into the determination of number of animals per dose.


9.1.2.8 Safety Pharmacology


Safety pharmacology studies investigate the potential undesirable pharmacodynamic effects of a substance on physiological functions in relation to exposure in the therapeutic range and above. The core battery of tests explores the potential for harm to the central nervous system, the cardiovascular system and the respiratory system. Supplementary studies exploring e.g. renal or gastrointestinal effects may be required. These studies are covered by the guidelines ICH S7A (ICH 2000b) and S7B (ICH 2005a) which highlight a key area for statistical contribution. The guidance states that the size of the groups should be sufficient to allow meaningful scientific interpretation of the data generated. Thus, the number of animals or isolated preparations should be adequate to demonstrate or rule out the presence of a biologically significant effect of the test substance, taking into account the size of the biological effect that is of concern for humans. The value added through appropriate powering of safety pharmacology studies is thus widely recognised. It is also worth noting that these studies have a wide variety of endpoints representing continuous, ordinal and nominal data types, which accordingly ensures that a wide variety of statistical methods are applicable to the data generated. Chapter 10 provides additional information and see also (Pugsley et~al. 2008).


9.1.2.9 Immunotoxicology


ICH S8 (ICH 2005b) provides recommendations on the nonclinical testing of non-biologicals for immunotoxicity. The term immunotoxicity in this guideline primarily refers to immunosuppression, i.e. a state of increased susceptibility to infections or the development of tumours. Initially, a number of factors are taken into account and a weight of evidence approach used to determine if further immonotoxicology testing is required. Examples of the factors considered are the structural properties of the compound, and observations from standard general toxicity studies in animals, e.g. certain haematological changes. Chapter 11 provides further information on immunogenicity evaluation.



9.1.3 A Word About Good Laboratory Practice


The regulatory authorities in many parts of the world, including the European Union and the United States of America, require studies designed to evaluate the safety of a new chemical or biological substance to be completed in accordance with Good Laboratory Practice (GLP) principles. Some countries, such as the United Kingdom, have implemented their own statutory regulations (based on the GLP principles), which means that this work is required by law to comply with the regulations. The GLP principles provide a framework to ensure that the planning, execution, monitoring, recording, reporting and archiving of a study are conducted in a suitably rigorous manner. The purpose of GLP is to provide assurance to regulatory authorities that the results presented for a given study reflect accurately what happened in the study and are therefore reliable for assessing the safety or risk associated with the substance under test. The statistician working on GLP studies will require GLP training, and periodic refresher training, and will need to follow defined standard operating procedures (SOPs) as well as adhering to a variety of other formal requirements such as the maintenance of records of qualifications, training, and ongoing continuous professional development. Analytical or computer systems that are used to analyse data and produce statistical results to be included in the study report for a GLP study must be validated.



9.2 Starting the Dialogue: The Statistician and the Safety Scientist




‘The time has come,’ the Walrus said ‘to talk of many things.’ (Lewis Carroll, in Through the Looking-Glass and What Alice Found There, 1872)


9.2.1 The Right Time to Talk


“We need your help to analyse some extra data collected on one of our studies please.” said the voice on the end of the phone. “We just need to see whether levels of ‘A’ in plasma change significantly with dose-level and time post-dose when this species is treated with compound ‘Y’.” “Was a statistician involved in the design of the additional data collection?” came the reply. “Oh no, this extra investigation was added on to a standard study, so the number of animals and the design was already set. We just need help with the data analysis.”

This sort of scenario is by no means rare. ‘Minor’ changes to an otherwise standard study to collect additional information is often referred to as ‘signal searching’, and it may not be immediately obvious to the experimentalist that the chances of detecting interpretable signals are increased if the study design is discussed with a statistician before the study starts, and before resources e.g. animals are ordered, which is often some time ahead of study execution. At the point of data analysis, however, it may be too late for the statistician to help. To quote Ronald Fisher: “To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.”

When the statistician and scientist met following the phone call described above, it became apparent that the effects of dose-level and time on ‘A’ couldn’t be accurately interpreted. Samples for measurement of ‘A’ had only been collected after the animals had been dosed with compound, and due to the relatively small study size, and relatively large variation in ‘A’ between animals and across time, effects were unclear. Had a small number of pre-dose samples been taken to indicate each animal’s baseline level and variation in ‘A’, the outcome might have been different. Had the statistician been engaged earlier, pros and cons of taking pre-dose samples in this situation might have been discussed, even though these were not required for the ‘standard’ part of the study’s purpose.

Discovering difficulties at the analysis stage that were not considered at the design stage is by no means limited to the in vivo setting. An example of a common shortcoming that hinders the evaluation of in vitro study data is insufficient characterisation of the assay’s variation on different testing occasions; that is, different occasions when the entire assay process is replicated from start to finish, including the sample preparation for the substance(s) under test. A conversation with a statistician when setting up, evaluating or routinely running in vitro assays can help to draw out the different potential sources of variation for the assay, and ensure these are appropriately understood and allowed for during the validation and routine running of the assay.

Although the examples given are not unique to nonclinical safety testing, it is worth being particularly alert in this environment, where many studies are run and analysed in a fairly standard manner, and the value of contacting the statistician about a minor change to objective or design may not be so obvious as for a bespoke study. Even when a specifically designed investigative study is required, the investigator may by default propose to run the study using a traditional design with which they are experienced, and in such a situation the statistician may be able to suggest a more efficient and effective design or analysis, or simply recommend minor adjustments to ensure that the objectives of the study can be met.

Of course, if the statistician is named as playing an official role in a regulated safety study, they will, as part of routine procedures, have the opportunity to review and comment upon the study plan. However, when this is not the case, how can early dialogue between study personnel and statisticians be encouraged? How can the message be shared that it’s never too early to contact a statistician, but often too late?

Any individual scientist probably learns this most effectively through their own experience and examples of working with statisticians, hence the relationship between the statistician and scientist is a key factor. Anything which facilitates this relationship e.g. co-located working is therefore likely to help. However, partly due to frequent personnel changes, this is sometimes a slow and not necessarily sufficient or enduring way to spread the message widely in an organisation. The statistician could proceed by working to integrate statistical considerations into the existing relatively formal procedures in nonclinical safety departments. For example, the key personnel involved in safety studies will require specific documented training, so it makes sense to try and ensure that an informal meeting with a statistician and some illustrations of when and where and how to engage a statistician form part of this training, in addition to any formal statistical training the scientist receives. Additionally, it may help to document in standard departmental procedures when and how a statistician needs to be engaged to review individual study proposals and protocols. Peers et~al. (2014) describe the benefits this can have when implemented systematically within an organisation.

Another way to encourage such dialogue and interaction early is to discuss regularly with managers current examples of the positive impact of early statistician/scientist collaboration, using these opportunities to increase awareness of common pitfalls to be avoided. This will help to secure managers as advocates who will encourage their staff to engage early with statisticians.


9.2.2 The Right Things to Talk About


Whenever the statistician is involved in the advance plans for a study, there will usually be a lot more to talk about than may initially be apparent. We focus here on three questions which may be particularly pertinent for nonclinical safety studies, especially in vivo.


What Do We Already Know

about this compound/drug/target from our studies or from the literature? If a safety study, investigative or otherwise, is being run, there’s likely to be relevant prior information of some sort. Understanding where the study sits in a program of work, and what prior knowledge exists, and with what level of confidence, is important when setting up designs. As an example, one investigative pharmaceutical safety study involved studying two different compounds. More relevant nonclinical information was available for one of these compounds than the other. Although the initial design proposed had all groups in the study being the same size, after statistical consultation, it was agreed that the study goals could be achieved using fewer animals for those groups to be dosed with the more fully characterised compound.


What Is Fixed and What Is Flexible?

If asked to comment on a study design, the statistician will do well to establish: which aspects of the study are fixed e.g. due to regulatory or other requirement; which aspects are established practice and would cause difficulties if altered; and where there is scope to be flexible. The inclusion of positive controls on studies serves as an example here. For safety pharmacology studies, the ICH guideline S7A (ICH 2000b) states: “In well-characterised in vivo test systems, positive controls may not be necessary. The exclusion of controls from studies should be justified.” So for an individual study, the agreed regulatory view determines that positive controls should either be included or their exclusion justified. It is likely that each organisation will have established practice relating to this which needs to be taken into account too: if it is established practice to include positive controls on all studies, then all study design, analysis and reporting systems and protocols will be set up to support this. However, there may still remain some flexibility e.g. it may sometimes be appropriate to propose a smaller group size for the positive control group than for the other groups.


Why Are Things Done This Way?

If things are done traditionally in your organisation in a certain way, do try to understand why before questioning or challenging based on your experience from working in a different area, or your theoretical knowledge. An example here is the determination of cage layout plans, and whether cages belonging to the same treatment group are kept together, or spread in a balanced way across rows and columns of the cage racking system. There are competing considerations here. On the one hand, it is desirable to reduce to an absolute minimum any opportunity for cross contamination between cages of animals receiving different treatments, and in many animal room settings, written procedures may require that this is done by keeping all cages for any treatment group underneath one another on the same rack. This also tends to facilitate speedier dosing, group by group, as any formulation can be quickly administered to relevant cages easily identifiable through their co-location. The competing consideration is the possibility that bias could creep in unintentionally when groups are experiencing very slightly different processes during the study such as the timing of receiving their dose and marginally different environmental conditions associated with different locations in the room. As caging and air-handling systems change over time, the balance of risks and benefits associated with different practices will change, so it’s worth asking the “why” question periodically, to see if the balance has changed favouring a change in practice.


9.2.3 The Right People to Talk To


The statistician may find that an overwhelming number of people are involved in any given safety study, particularly for GLP in vivo studies. Whilst the Study Director represents the single point of study control with ultimate responsibility for the overall scientific conduct of the study, any study is likely to require the input of multiple specialists, including experts in pathology, clinical pathology, formulation, toxicokinetics and toxicology, together with those who ensure excellence in the running of the study through co-ordination and delivery of in-life animal procedures. Some, maybe all of these individuals, will have important opinions on study design, including how many animals might be needed, how best to allocate animals to groups, and other scientific and practical considerations.

If you work in a contract research organisation or a scenario where the statistician’s input is limited primarily to performing power calculations and providing a statistical analysis plan to match a given design, then interaction with the Study Director may suffice. However, wherever possible and appropriate, it is far more effective and satisfying if the statistician can be included as an expert in meetings alongside other experts who together form the study design team. For example, if a specific haemodynamic marker is a critical endpoint in a particular study, the statistician may venture to suggest that randomisation of animals to groups should take into account a baseline measurement of this parameter to ensure balanced groups at the outset. It is much easier and more efficient to discuss the pros and cons of such a suggestion if the study in-life co-ordinator and the clinical pathology representative are sat around the same table as the Study Director and statistician, bringing together perspectives on design questions such as these.

Section 9.2 has focused in on the nuts and bolts of ensuring that statistical considerations are raised at the right time, in the right way, and with the right people. Section 9.3 will outline some of the typical characteristics of safety studies which may be unfamiliar to a statistician coming into this area.


9.3 Design Considerations for Safety Studies




At first we were amused at the dramatic (over) design of the hotel, but it seemed like sometimes design overshadowed function. There was no hot water in the bathroom. (TripAdvisor 2009)

It is possible to fall into traps at opposite ends of the spectrum when it comes to designing nonclinical drug safety studies: to accept current practice with insufficient questioning, or to get so overwhelmed thinking through all the considerations that study functionality gets overshadowed. In order to help the statistician new to this domain avoid each of these extremes, we outline here a few characteristics of studies and their related design issues which may be encountered more frequently in this setting than in other areas.
< div class='tao-gold-member'>

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jul 22, 2016 | Posted by in PHARMACY | Comments Off on Nonclinical Safety Assessment: An Introduction for Statisticians

Full access? Get Clinical Tree

Get Clinical Tree app for offline access