Experimental Design and Statistical Analysis for Pharmacology and the Biomedical Sciences. Paul J. Mitchell
the last 25 years or so, I have become increasingly involved in teaching the fundamentals of statistical analysis of experimental data to, initially, pharmacy and pharmacology undergraduates but, more lately, undergraduates in other disciplines (e.g. natural sciences, biomedical sciences, biology, biochemistry, psychology, and toxicology) and postgraduate students and early researchers in these and more specific areas of pharmacological research (e.g. neuropharmacology). Throughout this time, I have become increasingly aware of the statistical rigour required by scientific journals for publication of scientific papers to be approved. However, this has been coupled with increased anxiety on the part of both new and experienced researchers as to whether their statistical approach is correct for the data generated by their studies. These observations suggest that in the past the teaching of experimental design and statistical analysis has been poor across the sector. Indeed, if I mention stats (sic) to most researchers, they hold their hands to their face and perform an imitation of Edvard Munch's Der Schrei der Natur (‘The Scream of Nature’, or more commonly known as just ‘The Scream’)! Statistical analysis is often viewed as burdensome and an inconvenient chore, generally borne out of ignorance and a lack of appreciation of how useful rigorous statistical analysis may be.
Der Schrei der Natur (circa 1893), Edvard Munch (1863–1944)
I'll give you three examples:
Example 1:
On various occasions, I have had final year undergraduate students bang on my office door; I say bang – it was definitely more than just a polite knock, probably borne out of fear (of me? Never!), frustration, or sheer panic, holding a raft of printed data in one hand and a mug of coffee in the other (so how did they bang on the door?). After a period of trying to quell their anxiety, it seems that the student had been sent to me to gain advice on how to analyse the plethora of data generated by their final year project. My initial response has always been ‘I'm sorry, I can't help you!’. At this point, the student invariably looks at me incredulously and dissolves in floods of tears, sobbing ‘but you must help me, my project report is due in tomorrow (why is it always tomorrow?), and my supervisor said you are the stats guru and would be happy to help – I have nobody else to turn to and I really want a first' (don't they always)! I then explain to them that good scientific research is achieved by good experimental technique. This involves high‐quality experimental design not only of the experimental protocol involved but also the identification of how the resulting data are to be analysed. I then tell them to return to their supervisor and explain that the data are worthless and should be binned forthwith, that they need to sit down and go over their experimental design and build into their protocols the exact method by which they will analyse the resulting data before they perform any of the planned experiments. Good experimental design requires knowledge about the expected experimental output; what type of data will my experiment generate? It is only with this knowledge that the appropriate statistical techniques may be identified. The statistical tests used are an important component of the Methodology, and as such must be identified before getting your hands dirty and performing the experiments. Once the student and supervisor have identified the appropriate statistical approach, then the student can perform the experiments and gather the resulting data. The student's response (in between further floods of tears and sobs) is something along the lines of ‘so all this data is worthless (yes)? I need to identify the stats tests to use (yes!)? And then I do the experiments (yes!!)? BUT MY REPORT IS DUE IN TOMORROW (Tough! Not my problem)!!’ Whereupon the student invariably storms out of my office, reams of paper in one hand, coffee mug in t'other, slamming the door behind them (how do they do that if their hands are full?). 300 ms later either my phone rings or there are further knocks on my office door – it's the supervisor concerned, and rather irate! (I exaggerate here – I've never known an academic move that fast.) After carefully explaining the requirements of good experimental design and statistical analysis (to somebody who, let's face it, should know this anyway!), I finally agree to look at the student's data and provide advice as to how the data may be analysed. Interestingly, in subsequent years, it is the same supervisor's students who bang on my door seeking advice (again, too late in my opinion), so perhaps you can't teach an old dog new tricks. Most importantly, however, it is a lesson learnt by the student, so at least the next generation of pharmacologists have a fair chance of getting it right!
The principle problem here is ignorance that rigorous statistical analysis is a component of good experimental design. Consequently, the statistical methodology to be employed in research must be decided before the experiments are performed. In my experience, this is due to historically very poor teaching of statistics in pharmacology across the sector, such that those now with the responsibility of teaching pharmacology to current undergraduates or newly qualified graduates (whether they be in academia or the pharmaceutical industry) are themselves at a disadvantage and too naive to understand the importance of rigorous statistical analysis. Consequently, they are unable to provide high‐quality supervision to enable less experienced individuals to develop and hone their experimental technique.
Example 2:
I was once stopped in the corridor by a fellow post‐doc (and close friend) who described a series of experiments involving cell culture in different mediums which they were unsure how to analyse. Essentially, the post‐doc had a single flask of a particular CHO cell line and was trying to determine which of three mediums promoted the best cell growth. Three further flasks were prepared each one containing a different medium, and a sample of the cell line was decanted into each of the three test flasks. Sometime later, three samples were taken from each flask (so nine samples in total) and the number of cells per unit volume determined. The question was; ‘how do I analyse the data? Do I do a number of t‐tests (are they paired)? Do I do ANOVA? And if so, which post hoc test (don't worry I'll explain all these terms later in the book)’. I looked at the data, checked I had the right information about the design of this simple experiment and said ‘Sorry, you don't have enough data for statistical analysis – you only have an n of one in each case’. The post‐doc stared at me quizzically and said, ‘Don't be daft, I have n of 3 for each medium!’. ‘Er…, no!’, I replied, ‘You estimated the cell numbers in triplicate, but that only gives you n = 1 in each case, all you've done is obtain an estimate of precision and hopefully accuracy of your estimates, but that doesn't change the fact that you've only got n = 1 for each flask’. ‘No, no, no!’ the post‐doc strenuously exclaimed, ‘I have n = 3 in each case, three samples from each flask for the different mediums!’. ‘Er, no!’, I replied (at the risk of repeating myself), ‘If you wanted to do this properly then you should have prepared three flasks for each medium (so nine flasks in total), and decanted the same volume of CHO cells into each flask. Sometime later, you should then have taken 3 samples from each flask (so 27 samples) and estimated the cell number in each case. You would then calculate the average for each flask so that you get an improved accurate measure of the cell concentration in each flask (thanks to the measures in triplicate). This will then give you three measures for each medium which you can analyse by one‐way ANOVA followed by a Tukey All Means post hoc test (don't worry about these terms; all will become clear later in the book. I just included them to impress you, whet your appetite for what is to come and to try and convince you I know what I'm talking about!). The post‐doc looked at me aghast! ‘I don't have time for that!’, came the reply, ‘I have a group meeting with my Prof. this afternoon and I need to present this data so we can discuss which medium to use in our future studies – our latest grant proposal depends on demonstrating that one of these mediums is significantly different from the others, so I need to subject these data to statistical analysis!’. I looked at the summary bar chart the post‐doc had prepared from the data and it was clear from the eye‐ball test (this is probably one of the best tests to use to appreciate data and is very simple to perform – I'll reveal how later in the book!) that one of the mediums showed clear advantages in terms of cell growth than the others. ‘Just look at your data’, I said, ‘Medium X is