Planning and Executing Credible Experiments. Robert J. Moffat
lab today usually involve the behavior of systems with many components or processes with several simultaneous mechanisms. It is not easy to translate a “need to know” into an experiment under such complex conditions. The first problem is deciding what scalars (i.e. what measurable items) are important to the phenomenon being investigated. This step often takes place so fast and so early in a test program that its significance is overlooked. When you choose what to measure, you implicitly determine the relevance of the results.
The early years of the automobile industry provide at least one good example of the consequences of “leaping in” to measurements. As more and more vehicles took to the road, it became apparent that some lubricating oils were “better” than others, meaning that automobile engines ran longer or performed better when lubricated with those oils. No one knew which attributes of the oils were important and which were not, and so all the easily measured properties of the “good” oils were measured and tabulated. The result was a “profile of a good oil.” The oil companies then began trying to develop improved oils by tailoring their properties to match those of the “good oil profile.” The result was a large number of oils that had all the desired properties of a good oil except one: they didn't run well in engines!4
2.4 How Does Experimental Work Differ from Theory and Analysis?
The techniques of experimentation differ considerably from the techniques of analysis, by the nature of the two approaches. It is worthwhile to examine some of these differences.
2.4.1 Logical Mode
Analysis is deductive and deals with the manipulation of a model function. The typical problem is: Given a set of postulates, what is the outcome? Experiment is inductive and deals with the construction of models. The typical problem is: given a set of input data and the corresponding output data, what is the form of the model function that connects the output to the input? The analytical techniques are manipulative, whereas the experimental techniques are those of measurement and inference.
2.4.2 Persistence
An analysis is a persistent thing. It continues to exist on paper long after the analyst lays down his or her pencil. That sheet (or that computer program) can be given to a colleague for review: “Do you see any error in this?”
An experiment is a sequence of states that exist momentarily and then are gone forever. The experimenter is a spectator, watching the event – the only trace of the experiment is the data that have been recorded. If those data don’t accurately reflect what happened, you are out of luck.
An experiment can never be repeated – you can only repeat what you think you did. Taking data from an experiment is like taking notes from a speech. If you didn't get it when it was said, you just don't have it. That means that you can never relax in the lab. Any moment when your attention wanders is likely to be the moment when the results are “unusual.” Then, you will wonder, “Did I really see that?” [Please see “Positive Consequences of the Reproducibility Crisis” (Panel 2.1). The crisis, by way of the Ioannidis article, was mentioned in Chapter 1.]
The clock never stops ticking, and an instant in time can never be repeated. The only record of your experiment is in the data you recorded. If the results are hard to believe, you may well wish you had taken more detailed data. It is smart to analyze the data in real time, so you can see the results as they emerge. Then, when something strange happens in the experiment, you can immediately repeat the test point that gave you the strange result. One of the worst things you can do is to take data all day, shut down the rig, and then reduce the data. Generally, there is no way to tell whether unusual data should be believed or not, unless you spot the anomaly immediately and can repeat the set point before the peripheral conditions change.
2.4.3 Resolution
The experimental approach requires gathering enough input–output datasets so that the form of the model function can be determined with acceptable uncertainty. This is, at best, an approximate process, as can be seen by a simple example. Consider the differences between the analytical and the experimental approaches to the function y = sin(x). Analytically, given that function and an input set of values of x, the corresponding values of y can be determined to within any desired accuracy, by using the known behavior of the function y = sin(x). Consider now a “black box” which, when fed values of x, produces values of y. With what certainty can we claim that the model function (inside the box) is really y = sin(x)? Obviously, the certainty is limited by the accuracy of the input and the output. What uncertainty must we acknowledge when we claim that the model function (inside the box) is y = sin(x)? That depends on the accuracy of the input and the output data points and the number and spacing of the points. With a set of data having some specified number of significant figures in the input and the output, we can say only that the model function, “evaluated at these data points, does not differ from y = sin(x) by more than …,” or alternatively, “y = sin(x) within the accuracy of this experiment, at the points measured.”
That is about all we can be sure of because our understanding of the model function can be affected by the choice of the input values. Suppose that we were unfortunate enough to have chosen a sampling rate that caused our input data points (the test rig set points) to exactly match values of nπ with n being an integer. Then all of the outputs would be zero, and we could not distinguish between the “aliased” model function y = 0 and the true model function y = sin(x).
In general, with randomly selected values of x, the “resolution” of the experiment is limited by the accuracy of the input and output data. Consider Figure 2.2. In this case, sin(x) may be indistinguishable from {sin(x) + 0.1 sin (10x)} if there is significant scatter in the data. In many cases, the scatter in data is, in reality, the trace of an unrecognized component of the model function that could be included. One of an experimenter's most challenging tasks is to interpret correctly small changes in the data: is this just “scatter,” or is the process trying to show me something?
Figure 2.2 Is this a single sine wave with some scatter in the data? Does it have a superposed signal or both signal and scatter?
2.4.4 Dimensionality
Another difference between experiment and analysis that is important is their description in terms of “dimensionality.” A necessary initial step in an analysis is to set the dimensional domain of relevant factors; e.g. y = f (x1, x2, x3, …, xN). Once an analyst has declared the domain, then the analyst may proceed with certainty by applying the rules appropriate for functions of N variables.
Experimentalists cannot make their results insensitive to “other factors” simply by declaration, as the analyst can. The test program must be designed to reveal and measure the sensitivity of the results to changes in the secondary variables.
Experiments are always conducted in a space of unknown dimensionality. Any variable which affects the outcome of the experiment is a “dimension” of that experiment. Whether or not we recognize the effect of a variable on the result may depend on the precision of the measurements.