How do they measure the effectiveness of birth control?

Dear Cecil: How is the effectiveness of contraception measured? Do they survey people? Could researchers randomize different birth control methods, even if they wanted to? As much as I’d like to, I don’t think I could have nearly as much sex as it would take to make a statistically significant sample. Christine

Cecil replies:

Given the stakes involved — higher than those associated with, say, nasal decongestant — you’d certainly hope there’s plenty of published research to confirm that birth control really does what it’s supposed to. And sure enough, there is. Though gauging contraceptives’ effectiveness isn’t quite the grueling sexual slog you apparently imagine, you’re right to guess that logistical and ethical concerns make this task somewhat trickier than figuring out how many noses got unstuffed.

Typically researchers test a birth control method about the same way they’d test any drug or medical device — via randomized controlled trial. Participants are assigned randomly to one of several groups: some use the contraceptive that’s under scrutiny; others use some previously tested treatment to establish a baseline — that’s the control group. So when pharmaceutical docs tested a transdermal contraceptive patch in 2001, the control group got the pill; in a 1999 trial of polyurethane condoms, the controls used the latex kind.

What you won’t see in these studies, for obvious reasons, is a placebo control group: assuming your volunteers genuinely don’t want to get pregnant, you can’t just give some of them a sugar pill and tell them it’s the pill. Similarly, there usually isn’t a “no-method” group to compare to; if researchers want a baseline conception rate for young women regularly having sex without contraception, they may use an estimate based on external data. (Something like 85 percent within a year is a decent guess.)

And despite your concern, Christine, there’s no need for any one subject to shoulder the sample-size burden herself; the subjects enrolled in these studies regularly number in the thousands. FDA guidelines for condom-effectiveness studies, for instance, recommend at least 400 subject couples over a minimum of six menstrual cycles; testing may be conducted “outside of clinical care settings.” (Most participants prefer it that way, you’d figure, though undoubtedly not all.) But with the real action taking place offsite, test results depend at least in part on subjects’ self-reporting: in that 1999 condom study, participants kept “coital diaries” to record frequency of use, breakage and slippage events, etc.

To compare various contraceptives across multiple studies, you need a single apples-to-apples measurement of effectiveness. The most common is something called the Pearl Index, which professes to quantify how often a birth control method will fail per 100 woman-years of use: the lower the number, the more likely the method is to keep you fetus-free. Devised back in 1933, the Pearl Index enjoys the advantage of being simple to calculate: you just divide the number of pregnancies during a contraceptive study by the number of participants using the method and how many months the study went on, then multiply by 1,200. That’s it. Spermicide used alone might score as high as 20; the pill is somewhere between 0.1 and 3.

Simple — or too simple? A big problem with the Pearl Index is that it assumes the results of a study are consistent from month to month, and that just ain’t so. The longer a contraceptive trial continues, the rarer pregnancies become. Why? The most fertile women conceive early and drop out of the study; the women who remain may be less pregnancy-prone, or they may have grown increasingly adept at using the birth control method. Long trials, then, tend to produce lower Pearl numbers, and thus can’t be compared fairly to shorter ones. For this reason, many researchers prefer a stat format called life tables (or decrement tables), which shows results broken out by month instead.

But much of what we know about relative contraceptive effectiveness isn’t based on clinical trials at all. For decades now, Princeton population researcher James Trussell has been compiling and reviewing current data on birth control use for a series of reports called “Contraceptive Failure in the United States.” In setting out his 2011 charts of unintended pregnancy rates, Trussell leans less on test results than on women’s responses (adjusted appropriately) from the long-running National Survey of Family Growth, run by the Centers for Disease Control. Now, it’s the CDC, so the survey is conducted with the utmost rigor. But trying to correct for known distortions in the data, Trussell suggests, is complicated to say the least: study participants regularly underreport abortions, for instance, meaning a number of unintended pregnancies don’t get counted; but if you adjust for this by surveying women seeking abortions in clinics, they tend to overreport that they really were using contraception, meaning you count too many failures.

If we’re always having to take the subjects’ word for it, you may wonder, how do we reliably distinguish between contraception failure — called “perfect-use failure” in the literature — and user error? This issue isn’t lost on Trussell: “Additional empirically-based estimates of pregnancy rates during perfect use are needed,” he concludes. The march of science is being held back, it seems, because there aren’t enough folks who can roll a condom on correctly every time.

Cecil Adams

Send questions to Cecil via cecil@straightdope.com.