First I'll describe why we should perform self-experiments in this way (if we're going to perform them at all), then how to go about doing so, and finally what to do in the cases where such rigorous self-experimentation is obviously impractical.
Just to be clear, I'm not suggesting everyone should actually go forth and begin performing experiments in this manner. But it's useful to understand the theoretical principles, and for those who are interested in seeing how certain foods affect their blood sugar, blood pressure, or some other parameter, this post will be of practical import.
- Repeated observations.
The reason for repeated observations is simple: if I want to show that my response to two different foods is different, I need to show that the variation between them is greater than the variation within them.Say I want to know whether bananas spike my blood sugar more than strawberries. To address this question, I eat bananas for breakfast on Monday and my blood sugar goes up to 130 mg/dL, and then I eat strawberries for breakfast on Tuesday and my blood sugar goes up to only 125 mg/dL. Does this support my hypothesis? Not really. The reason is that I have no idea what my blood sugar would have gone up to if I had eaten either fruit a second or third time. If I ate strawberries again on Wednesday and my blood sugar went up to 135 mg/dL, suddenly my conclusions would fall apart.
I can avoid this problem entirely by repeating my strawberry trial a few times and my banana trial a few times so I can assess the natural variation in my responses to each fruit. If the difference in the average response to each fruit is large enough or the variation within my responses to each fruit is small enough, I can conclude that one affects my blood sugar worse than the other. I'll describe how to make that decision below.
The reason for randomization is a little less obvious. Since the secret's already out that this blog is just a front for the military-industrial complex, I'll defer to my good friend* Donald Rumsfeld:
Randomization is a way to control for the unknowns, and especially for the unknown unknowns.
If we were going to divide people into two groups for a controlled clinical trial, we would have to allocate them randomly. In our self-experiment, we have to allocate the order of trials randomly. In other words, I can't test the effect of bananas five times this week and then test the effect of strawberries five times next week. I have to alternate between bananas and strawberries in random order.
The simple reason is that time is a confounder. Time, in fact, is the worst confounder of all because it indirectly introduces a whole host of unknowns, both of the known and unknown variety. We could all make lists of things that can change with time. The lists might look very different from one another and if we pooled them all into one list it would be ginormous. The confounders we didn't include because none of us thought of them would be more numerous still. In principle, randomizing the order of trials controls for them all by taking time out of the equation entirely.
How to Randomize
The easiest way to randomize the order of our self-experiment would be to use a random number generator. If we hop over to Random.Org we can generate random numbers within a certain range. A simple way of randomizing would be to have “0” code for doing strawberries first and bananas second, and have “1” code for the opposite. We could randomly generate a few zeroes and ones and then we'd be done. Since we are only making a simple comparison between two fruits, we could opt instead to just flip a coin.
How to Choose the Number of Trials
Our ultimate goal is to determine, in this example, whether my average blood sugar response to one fruit is different from my average response to the other. If my response to each fruit is very consistent, I might get by with just three measurements for each fruit. If it's very inconsistent, it will be harder for me to estimate my average response and making this estimation will require a greater number of trials. This will become clearer below.
How to Tell If the Responses Are Different
So how do we tell if my response to bananas is different from my response to strawberries? The short answer is I should plug the data into some simple statistical software and run a t-test. You can do this for free here:
If my response to each fruit is consistent, I should only need to do about three tests with each of them.
If my response to each fruit is more variable, I might have to do more. As a good rule of thumb, we could start with three and see if there's a significant difference. If not, we could run a couple more tests and see if it gets closer to significance. There are more rigorous ways to determine the sample size we need, but sheesh we're not trying to justify ourselves at the feet of some bureaucracy or publish a paper here, so I think we can cut a few corners. We just need to be careful of bias — we don't want to keep performing the experiment until we get the result we want and then stop.
If we want to be really careful about this, though, we could perform a few tests to guestimate the “n” we need and then ignore all these results and start afresh, committing ourselves to a specific number of observations and then patting ourselves on the back for our objectivity.
In order to try to maintain as little variation as possible, and thus be able to get away with fewer trials of each fruit, we should attempt to keep any conditions we can think of as consistent as possible. For example, we should conduct the test at the same time of day, having fasted for a similar length of time since our last meal. Random differences in such conditions will not destroy the interpretation of the experiment but will decrease our statistical precision and require us to repeat more observations.
A Few Technical Considerations
There are two technical problems that could arise related to the independence of the trials. We want to minimize any effect that one trial might have on another. We can imagine a couple situations where that could be a problem.
For example, say we are taking a vitamin supplement. The supplement might take a few days to clear from our system, so we would want to separate the trials by at least a few days. This is called a wash-out period. Having a sufficient wash-out period between trials can help guarantee their independence.
The second problem is that there could be a time-dependent trend. For example, if we are eating a low-carbohydrate diet and suddenly we start to run tests on our blood sugar response to different fruits, we may steadily adapt to eating fruit over several weeks and our blood sugar responses may steadily improve. In this case, we can increase our statistical precision by using a paired t-test. To do this we simply pair the first two trials, then the second two, the third two, and so on. How to do this should be apparent after clicking on the above link to use the free t-test program.
We Don't Need to Know Everything
It would, quite clearly, be foolish to rest on definitive demonstrations of cause-and-effect for everything we do. This would be paralyzing. It's quite clear that if someone wants to go gluten-free for six months, they're not going to repeat this three or five times, randomly alternating with a six-month gluten-gobbling period.
A randomized, controlled self-experiment is the ideal form of self-experimentation but this doesn't mean we should ignore the rest of our personal experience. We can, at a minimum, demonstrate that making a given dietary change is at least consistent with improved health simply by experiencing such an improvement in health after making such a dietary change. We only have one life to live, and the most sensible thing might be to stick with what seems to work and move on.
Even so, understanding the essential role of randomization and repeated observations in demonstrating cause and effect can help us interpret that experience. Realizing that many of our past experiences may not provide us with definitive cause-and-effect information can help us infuse some flexibility into our dietary theories and make the changes we might need to make now or in the future rather than getting trapped into dietary dogmatism.
Where practical, however, a randomized, controlled self-experiment can provide valuable information. In the future, I'll conduct a few of these on myself and post about them.
* Mr. Rumsfeld and I go way back. One time in the 1990s when we were working on the Dole campaign together, he got so angry at a mouse who chewed through all his packets of NutraSweet in the middle of the night that he wanted to blow through a nest it had burrowed in the wall with a nuclear warhead. I reasoned with him that this could backfire and create a public relations disaster, and he backed down. I always found Rumsfeld's temper disturbing, but the compelling simplicity of his approach to statistical analysis remains beyond reproach to this day. I often wonder how the world would be different if Mr. Rumsfeld had chosen this discipline as his profession, but as he would always say to me, “One can never randomize the universe to alternative histories or futures with an n of 1.” Or as others say, you only have one life to live.
Acknowledgment: Special thanks to NYC-based statistical consultant Karen A. Buck for discussing this concept with me.