MERRITT, Chief Judge.
For a judicial system founded on the premise that justice and consistency are related ideas, the inconsistent results reached by courts and juries nationwide on the question of causation in Bendectin birth defect cases are of serious concern. In this Bendectin causation case, Judge Eugene Siler concluded that the evidence adduced
The general issue for review here is whether the trial judge erred by withdrawing the case from the jury and by granting summary judgment for the defendant pharmaceutical company. The more specific issues are, first, whether a court should judge for itself the validity of the reasoning process by which various competing qualified experts have reached their conclusions or should instead leave that question for the jury; and second, whether the evidence in this case, if so reviewed, is sufficient to withstand the defendant's motion for summary judgment.
We agree with Judge Siler that, although judges should respect scientific opinion and recognize their own limited scientific knowledge, nevertheless courts have a duty to inspect the reasoning of qualified scientific experts to determine whether a case should go to the jury. Based on the record before us, we also agree with Judge Siler that whether Bendectin caused the minor plaintiff's birth defects is not known and is not capable of being proved to the requisite degree of legal probability based on the scientific evidence currently available. Taken in the light most favorable to the plaintiffs, the scientific evidence that provides the foundation for the expert opinion on causation in this case is not sufficient to allow a jury to find that it is more probable than not that Bendectin caused the minor plaintiff's injury. Therefore the case should not go to a jury.
We will first summarize the Bendectin causation issue and the case law that has developed during the past twelve years. We will then analyze the evidence in greater detail and show why it does not meet the legal test of causation.
I. Overview
The nausea of morning sickness affects many pregnant women and, although the causes are not completely understood, in extreme cases may cause permanent injury to the sufferer's unborn child. Merrell Dow manufactured and marketed Bendectin as an anti-nauseant prescription for morning sickness from 1956 until 1983 when it took the drug off the market despite continued approval from the Food and Drug Administration. Estimates indicate that Bendectin was prescribed from 1957 until 1982 to over 30 million women worldwide and to more than 17.5 million women in the United States. These women commonly took Bendectin during the first trimester of pregnancy.
Approximately seven weeks after becoming pregnant, Betty Turpin ingested Bendectin to combat morning sickness. The initial development of the fetus's fingers and toes occurs some four to eight weeks after conception. Seven months after Ms. Turpin first took the drug, her child, Brandy Turpin, the infant plaintiff in this case, was born with "limb reduction defects": severely deformed hands and feet, specifically fused joints and shortened or missing fingers and toes. Ms. Turpin took no other drugs during the course of her pregnancy, nor can her child's deformities be traced to any known genetic disorders.
Causation here is a matter of trying to measure probabilities. It requires a complex series of inferences drawn from scientific experiment and observation and statistical comparisons. For example, the plaintiffs rely primarily on animal experiments from which an inference is drawn that since chemical compounds in Bendectin, if administered at certain levels, cause birth defects in animals, they may cause similar defects in humans. The plaintiffs draw a further inference that Bendectin caused the birth defects in this particular case. These inferences are necessary because physicians who treated Brandy Turpin and other similarly situated children cannot diagnose the cause of these anomalies.
The defendant, too, reasons from the results of scientific studies to a particularized conclusion with respect to these plaintiffs. Merrell Dow relies primarily on statistical studies that purport to show that the incidence of certain birth defects is no higher
The causation proof in Bendectin birth defect cases is offered by expert witnesses who speak in terms of population groups and statistical samples rather than specific individuals. The expert witnesses on each side are often the same, from case to case, and even when different the scientific conclusions and theories are based on the same or similar statistical studies and scientific experiments. The cases are variations on a theme, somewhat like an orchestra which travels to different music halls, substituting musicians from time to time but playing essentially the same repertoire.
A brief survey of the reported Bendectin cases illustrates the inconsistency of courts that have dealt with the scientific problem of causation. We find only one reported case finally upholding a finding of causation. In Oxendine v. Merrell Dow Pharmaceuticals, Inc., 506 A.2d 1100 (D.C.App. 1986), aff'd in part on appeal after remand, 563 A.2d 330 (D.C.App.1989), cert. denied, 493 U.S. 1074, 110 S.Ct. 1121, 107 L.Ed.2d 1028 (1990), the appellate court reversed the trial court's grant of a judgment n.o.v. and motion for new trial to the defendant and reinstated the jury's $750,000 verdict for the plaintiffs. On the other hand, in four other reported cases, juries returned verdicts for the defense which were allowed to stand. Wilson v. Merrell Dow Pharmaceuticals, 893 F.2d 1149 (10th Cir.1990) (affirming judgment for the defendant and noting also that the plaintiffs' motion for judgment n.o.v. was correctly denied by the district judge); Will v. Richardson-Merrell, Inc., 647 F.Supp. 544 (S.D.Ga.1986) (denying plaintiffs' motion for judgment n.o.v.); In re Richardson-Merrell, Inc. "Bendectin" Products Liability Litigation, 624 F.Supp. 1212 (S.D.Ohio 1985), aff'd, 857 F.2d 290 (6th Cir.1988) (denying plaintiffs' motion for judgment n.o.v. in an order addressing 818 of 844 consolidated multidistrict cases in the largest of all Bendectin cases); and Cosgrove v. Merrell Dow Pharmaceuticals, Inc., 117 Idaho 470, 788 P.2d 1293 (1990) (affirming jury's finding that Bendectin was not the proximate cause of child's injuries).
Four federal circuits have held that plaintiffs failed as a matter of law to establish causation of birth defects. The Fifth Circuit, without ruling specifically on the admissibility of the plaintiffs' expert testimony, reversed a jury verdict for the plaintiffs and granted judgment n.o.v. to the defendant because adequate proof of causation was lacking. Brock v. Merrell Dow Pharmaceuticals, Inc., 874 F.2d 307, reh'g. denied, 884 F.2d 166 (5th Cir.1989), cert. denied, 494 U.S. 1046, 110 S.Ct. 1511, 108 L.Ed.2d 646 (1990), limited by Christopherson v. Allied-Signal Corp., 902 F.2d 362, 367 (5th Cir.1990), rev'd on reh'g on other grounds, 939 F.2d 1106 (5th Cir.1991) (en banc). Another circuit, the Ninth, affirmed a grant of summary judgment for the defendant after holding that the plaintiffs' reanalyses of Merrell Dow's epidemiological studies were unreliable for lack of peer review. Daubert v. Merrell Dow Pharmaceuticals, Inc., 951 F.2d 1128 (9th Cir.1991). Two other circuits reached the same result by ruling inadmissible the plaintiffs' expert testimony on grounds that it was not the type "reasonably relied upon" by qualified experts in the specific fields of study. Richardson v. Richardson-Merrell, Inc., 857 F.2d 823 (D.C.Cir. 1988), cert. denied, 493 U.S. 882, 110 S.Ct. 218, 107 L.Ed.2d 171 (1989) (reversing jury verdict for the plaintiff; in the face of the defendant's epidemiological evidence, an insufficient foundation existed for the plaintiffs' animal and chemical studies); Lynch v. Merrell-Nat'l Labs., 830 F.2d 1190 (1st Cir.1987) (holding that the plaintiff's in vivo and in vitro studies were inadmissible; therefore, insufficient evidence existed to avoid summary judgment for the defendant); see also Ealy v. Richardson-Merrell, Inc., 897 F.2d 1159 (D.C.Cir.), cert. denied, ___ U.S. ___, 111 S.Ct. 370, 112 L.Ed.2d 332 (1990) (reversing jury verdict for the plaintiff for $20 million in compensatory
Four District Court cases nationwide have granted summary judgment to the defendant for various reasons. Lee v. Richardson-Merrell, Inc., 772 F.Supp. 1027 (W.D.Tenn.1991) (relying on Richardson, Brock, and Judge Siler's opinion in this case); Cadarian v. Merrell Dow Pharmaceuticals, Inc., 745 F.Supp. 409 (E.D.Mich.1989) (holding that an inadequate foundation existed for expert's opinion); Hull v. Merrell Dow Pharmaceuticals, Inc., 700 F.Supp. 28 (S.D.Fla.1988) (finding that the body of scientific literature established Bendectin's safety and that the infant plaintiff's mother took the drug too late in her pregnancy to affect the fetus); and Monahan v. Merrell-Nat'l Labs., No. 83-3108-WD, 1987 WL 90269 (D.Mass. Dec. 18, 1987) (finding that summary judgment for the defendant was required under the First Circuit's earlier holding in Lynch).
In contrast, other courts have either denied or reversed on appeal grants of summary judgment for the defendant in eight cases. In DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941 (3rd Cir. 1990), the Third Circuit reversed the trial court's grant of summary judgment to Merrell Dow and remanded for the District Court to analyze the reasonableness of an expert witness's epidemiological opinion of causation under Federal Rule of Evidence 702. See also Longmore v. Merrell-Dow Pharmaceuticals, Inc., 737 F.Supp. 1117 (D.Idaho 1990) (expressly declining to adopt Richardson, Lynch, and Brock approaches); In re Bendectin Products Liability Litigation, 732 F.Supp. 744 (E.D.Mich.1990) (holding that collateral estoppel did not bar the plaintiffs on the issue of causation and that, in the face of experts' disagreements on necessity of epidemiological proof, the court could not reject other types of evidence); and DePyper v. Navarro, No. 116390 (Mich.Ct.App. May 9, 1991) (holding that the trial court erred under state law by not inquiring whether experts in field generally accepted the methodology of the plaintiffs' expert). For other denials of summary judgment or denials of the defendant's motions for directed verdict, see Hagen v. Richardson-Merrell, Inc., 697 F.Supp. 334 (N.D.Ill.1988) (denying summary judgment on causation but granting summary judgment on punitive damages); Mangels v. Richardson-Merrell, Inc., No. R-83-3272 (D.Md. Aug. 17, 1987) (summary judgment denied because a triable issue of fact existed for jury resolution); and Lanzilotti v. Merrell Dow Pharmaceuticals, Inc., No. 82-0183, 1986 WL 7832 (E.D.Pa. July 10, 1086) (denying the defendant's motion for a directed verdict).
The fundamental reasons for the inconsistency of the legal system in handling Bendectin claims appear to be first, the difficulty of scientists and hence of judges, lawyers and jurors in knowing what reasonable inferences of causation to draw from animal experiments and epidemiological studies; and second, the uncertainty of judges about how far they should enter the scientific thicket of conflicting inferences in order to determine whether the basis of a scientific opinion concerning causation is sufficiently plausible to allow a jury to ground a verdict on it. There are two important questions here: How hard should judges look at the reasonableness of scientific theories and inferences before they decide whether there is enough to the case for it to go to the jury? If we apply a "hard look" doctrine, as we are inclined to do in scientific cases based primarily on expert testimony, what exactly are the general scientific experiments and studies capable of showing about whether Bendectin causes birth defects in a particular case?
We believe that close judicial analysis of such technical and specialized matter is necessary not only because of the likelihood of juror misunderstanding, but also because expert witnesses are not necessarily always unbiased scientists. They are paid by one side for their testimony. Although
II.
In this legal context, we review the evidence and arguments offered by the parties on summary judgment. In determining and applying the correct standards of proof on summary judgment in scientific cases, we look to the rules of sufficiency of the evidence to decide whether juries should be allowed to hear the evidence as well as the rules of admissibility of expert testimony that shape the facts and opinions to be considered. This case, we believe, should be decided on the rules of the sufficiency of evidence of causation on summary judgment, as Judge Siler held below in the alternative.
In the instant case, the plaintiffs claim that their infant daughter's birth defects were caused by the mother's use of Bendectin during her pregnancy and specifically by one ingredient, doxylamine succinate. The plaintiffs' case relies on animal studies and attacks the defendant's epidemiological studies. The defendant's case relies on the epidemiological studies and attacks the animal studies.
The plaintiffs offered expert opinions from ten witnesses in eight scientific fields to assess whether Bendectin is "teratogenic," i.e., capable of causing birth defects. These opinions were based on in vitro and in vivo animal studies, and reassessment of the defendant's epidemiological studies derived from study of humans. In support of its motion for summary judgment, the defendant relies primarily on 35 human epidemiological studies supporting a finding that the use of Bendectin does not cause birth defects. Some of these studies were conducted by scientists under contract with the defendant. Others were independent.
A. Defendant's Proof — Epidemiological Studies
Both sides appear to accept the fact that limb defects generally appear in less than one in 1,000 live births. The defendant's proof consists in large measure of the 35 extant studies published in medical and scientific journals on the statistical relationship between the use of Bendectin and the incidence of various forms of birth defects in babies, none of which conclude that a causal connection exists. For an extended explanation of the complex statistical methodology used in such epidemiology studies, including the use of such terms of art as the "null hypothesis," "significance testing," "P value," "relative risk" and "confidence interval," see Part 1.B of the Third Circuit's recent Bendectin opinion, DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941, 945-49 (3rd Cir.1990).
1. The San Francisco Study: Two University of California researchers studied effects of six anti-nauseant drugs on 11,481 pregnancies in the San Francisco area over seven years. Bendectin was prescribed in only 628 of these cases. Birth defects were monitored at three ages: one month, one year, and five years (though limb defects in particular were not isolated and reported). The average rate of all types of birth defects for Bendectin cases at the age of one month was 0.8 in every 100 births. This average rate is less than the average rate of the birth defects found in mothers who did not use Bendectin. The comparative rates at the one and five-year periods were similar. Although no specific relative risk was assigned to Bendectin use, the authors concluded that Bendectin, when taken at a recommended dosage level, was not teratogenic. Lucille Milkovich & Bea Van den Berg, An Evaluation of the Teratogenicity of Certain Antinauseant Drugs, 125 Am. J. Obstetrics & Gynecology 244, 245-48 (1976).
2. The Boston and Harvard Study: Six doctors from Boston University and the Harvard School of Public Health evaluated a group study of 50,282 mothers and children for Bendectin's possible effect on
3. The Atlanta Study: Over 280,000 births were monitored in the Atlanta area over a ten-year period for maternal exposure to various drugs including Bendectin. This population base was twenty times larger than that in the San Francisco study. Of 1,231 birth defects cases, 117, or 9.5 percent, of the mothers took Bendectin. Of 129 children born with limb defects, 14 (10.9 percent) had mothers who took Bendectin. The study calculated a relative risk of 1.18 at a 95 percent confidence interval between 0.65 and 2.13. However, for one subgroup of limb defects known as the "amniotic band complex," a higher relative risk — 3.88 — was reported. Therefore, the risk of a Bendectin-exposed mother giving birth to a child with this specific condition was almost four times greater than that which would occur in a population of non-exposed mothers. Two other forms of birth defects (herniated-brain and esophageal defects) had relative risks of 1.84 and 2.47, with 95 percent confidence levels between 0.63 and 5.37 and between 0.84 and 4.89, respectively.
For most birth defects, including limb defects, the authors determined that no "statistically significant" associations could be traced. As for the three above-mentioned possible associations, the authors noted possible confounding factors that might weaken the power of such associations, so that "the data [did] not suggest that Bendectin is causally associated" with birth defects. Jose Cordero et al., Is Bendectin a Teratogen?, 245 JAMA 2307 (1981).
4. Pyloric Valve Defects: The highest associations found between Bendectin use and birth defects were focused not on limb defects but on pyloric stenosis (abnormal constriction of the stomach's pyloric valve). In one Yale School of Medicine study, a relative risk of 1.40 was detected between mothers using Bendectin (1,427 cases) and non-users (3,001) for birth deformities. When the survey exclusively focused on infants with pyloric valve defects, six mothers taking Bendectin gave birth to children with this defect, as opposed to 29 mothers who did not use the drug. At a 95 percent confidence interval between 1.75 and 10.75, the relative risk of this stomach valve defect was 4.33. "Thus, more than one in 10 cases of pyloric stenosis may be due to maternal use of Bendectin," although no direct causal relationship could be ascertained. Brenda Eskenazi & Michael Bracken, Bendectin (Debendox) As a Risk Factor for Pyloric Stenosis, 144 Am.J. Obstetrics & Gynecology 919, 921-24 (1982). The same study noted that one child with a limb reduction defect was born to a mother taking Bendectin, while five defects occurred in children of women not taking the drug. Although a relative risk of 4.19 resulted for limb defects, this was regarded as "nonsignificant" by Eskenazi and Bracken, id. at 923, possibly due to the smallness of the group studied and the wide range of the confidence interval.
A later study conducted by Boston University Medical Center of 13,346 births around Puget Sound tended to support Eskenazi's and Bracken's findings. In 3,385 cases involving Bendectin use, 13 babies were born with pyloric valve defects,
5. The Sydney Study: University of Sydney researchers compared pregnancy histories for mothers of 155 children born with limb reduction defects with those for the mothers of 274 control group children; 26 percent of the 429 mothers in both groups used Bendectin during the first trimester of pregnancy. The relative risk resulting was 1.1, with a 95 percent confidence interval between 0.8 and 1.5. The Australian researchers concluded that "[o]n these figures, there is no evidence that women who take [Bendectin] ... are more likely to bear a limb-deficient child than women who do not take this drug." Janet McCredie et al., The Innocent Bystander, 140 Med.J.Austl. 525, 526-27 (1984).
6. The National Institute of Health Study: In the most recent Bendectin study, two National Institute of Health researchers evaluated 31,564 births in Northern California. Of those women 2,771 (nine percent) had used Bendectin. For 58 categories of defects studied — limb defects, however, were not specifically monitored — 135 defects occurred in cases of Bendectin exposure, while 1,439 defects occurred in non-exposed cases. Relative risks were greatest in three categories: lung defects (4.6), microcephaly, i.e., small head size (3.1), and cataracts (3.7). The 95 percent confidence intervals varied widely for these three categories, ranging between 1.9 to 10.9 for lung defects, 1.8 to 15.6 for microcephaly, and 1.2 to 24.3 for cataracts.
Despite these findings, the authors surmised that the three statistically significant defect groups were possibly spurious, as they were "exactly the number ... that would have been expected by chance." How this determination was made, however, is not specified by the authors. The authors concluded that no increase in overall rates of defects existed after Bendectin use and that the three associations "are unlikely to be causal." Patricia Shiono & Mark Klebanoff, Bendectin and Human Congenital Malformations, 40 Teratology 151, 152-55 (1989).
In addition to the 35 epidemiological studies, the defendant also offers as evidence the fact that no one has detected a decrease in the incidence of birth defects after Bendectin was removed from the market in 1983. Dr. Lamm so testified for the defendant based on a number of studies.
B. Plaintiffs' Challenge to the Epidemiological Proof
The plaintiffs claim that the defendant's 35 studies are based on samples which are too small to prove the absence of causation in light of the infrequency of instances of birth defects in general; that they do not adequately isolate limb reduction defects from other birth defects; that they do not
At least two expert witnesses for the plaintiffs attack the persuasive force of the defendant's statistical comparisons of the incidence of Bendectin-related birth defects. Dr. Glasser criticized the defendant's use of studies by Aselton, Jick, Cordero and Eskenazi as not correctly considering other birth defects, such as heart and pyloric valve defects or cleft palates, in assessing Bendectin's capacity for limb birth defects. In his affidavit, Dr. Glasser also criticized the Cordero, Eskenazi and McCredie studies for incorrectly inferring that no association existed between Bendectin use and infant limb reduction. Dr. Swan, similarly, rejected these studies' sole reliance on a relative risk of 1.0 within a 95 percent confidence interval as a basis for concluding that Bendectin does not cause birth defects in humans. Dr. Swan further claimed that several of the studies were conducted using insufficient populations or control groups, so that scientists wrongly calculated exposures to the drug. Dr. Swan viewed these and other factors as confounding the validity and power of such reports; however, both Drs. Glasser and Swan relied on these studies as the basis for their own recalculations, using a lower confidence interval that is claimed to derive a higher relative risk. Both experts concluded from their own reassessments that to a reasonable degree of epidemiological certainty, there is some association between Bendectin and limb reduction defects.
Although we agree with the defendant that its epidemiological studies and Dr. Lamm's testimony constitute evidence on which a jury might ground a defendant's verdict, we agree with the plaintiffs' experts that this evidence is by no means conclusive. The defendant's claim overstates the persuasive power of these statistical studies. An analysis of this evidence demonstrates that it is possible that Bendectin causes birth defects even though these studies do not detect a significant association.
Limb reduction defects occur in such a small percentage of both Bendectin and non-Bendectin live births — as noted, these occur in less than one in every 1,000 — that it would take a carefully controlled comparison of a very large number of births to instill confidence in the predictive power of the outcome. Also, many of the defendant's studies apparently do not control for many factors that may be crucial for scientists to accord great weight to the studies, such as the stage of pregnancy during which the mother took Bendectin, the other drugs the mother may have taken, or other harmful conditions, natural or otherwise, that may have been part of the mother's environment.
The Bendectin epidemiological studies are examples of a large number of studies of birth defects that demonstrate significant scientific uncertainty concerning their causes. A recent epidemiological study observes that "the etiology of congenital anomalies is poorly understood, with a estimated 60 [percent] of all human birth defects having no known cause." Patricia Olshan et al., Paternal Occupation and Congenital Anomalies in Offspring, 20 Am.J. of Indus.Med. 447 (1991) (exploratory study finding a correlation between birth defects and the occupation of the
C. Plaintiff's Proof — Animal Studies
The cartilage cells that later become the bones of fingers and toes begin to form in the human embryo during the fourth through eighth weeks of pregnancy. The plaintiffs' theory is that chemical compounds in Bendectin interfere with the formation of these cartilage cells, or chondrogenesis, and that this causal relationship is shown by animal experiments. The plaintiffs' proof includes experiments with animal cells and embryos, known as in vitro studies, performed by developmental biologists to observe possible toxic effects on animal tissue when tested in petri dishes. Other animal experiments, in vivo studies, consisted of tests performed on animals such as rabbits, chickens, monkeys, rats and dogs to determine if Bendectin's ingredients created birth defects at various dosage levels. In these experiments, doxyalamine succinate, an ingredient of Bendectin, was injected into animal cells that produce or grow into cartilage that becomes the bones in which limb defects may occur. The plaintiffs' scientific hypothesis based on these studies is this: Because doxyalamine succinate interferes with cartilage cell formation in animal cells and test animals, Bendectin is "capable" of causing similar limb defects in humans. The following examples illustrate the nature and findings of these animal cell studies:
1. In vitro studies: As a developmental biologist, Dr. Newman stated that he performs experiments on embryonic cells in petri dishes to determine how those cells develop and create tissue. Due to their similarities to the human embryo, chicken and mice or rat embryos are most frequently used in these studies. Limb-forming cells are removed from the embryo — for chickens, wing and leg formation cells are used — and are isolated in a dish, where selected cells are treated with a suspected teratogen. Changes in cell differentiation between the control group of untreated cells and the exposed group are observed and recorded.
Dr. Newman pointed to experiments on rat cells with substances similar to doxyalamine succinate. In these in vitro tests, various defects including limb reduction were observed. Other in vitro tests performed by National Institute of Health experts and relied upon by Dr. Newman found that doxyalamine succinate interfered with cartilage development in mice and chicken limb cells. In one experiment, the addition of 10 micrograms of Bendectin to an animal cell culture reduced one of the components of cartilage cells, proteoglycan, by 30 percent. Similarly, 50 micrograms of Bendectin per milliliter of a culture reduced proteoglycan production by 50 percent, thus suggesting a strong teratogenic effect in the animal cells tested.
Like the other scientists who testify concerning animal experiments, Dr. Newman can only testify that these chemical compounds connected with Bendectin are "capable of causing" limb defects in humans, not that they do cause such defects.
2. In vivo studies: Dr. Gross, a pathologist and veterinary medical expert with the Environmental Protection Agency, described the nature of the in vivo studies proffered by the plaintiffs. In these experiments, suspected teratogens are administered to pregnant female animals. Shortly before their birth, the infants are removed from the mother and studied for defects. Dr. Gross examined a variety of these studies,
A recognized text on teratology states the customary scientific view that "it has become axiomatic in experimental teratology that agents capable of causing any adverse biological effects can usually also be shown to be embryotoxic under the right conditions of dosage, developmental stage, and species susceptibility," and that "virtually all drugs and a great range of chemicals can indeed be shown to be embryotoxic under appropriate laboratory conditions." James Wilson, Current Status of Teratology, in Handbook of Teratology 60 (J. Wilson & C. Fraser, eds. 1977). The author concludes that to "eliminate drugs and chemicals because they can be shown to be embryotoxic at high dosage would be unacceptable" because to do so "would eliminate most drugs and many useful chemicals upon which modern society depends heavily." Id.
The weakness of the plaintiffs' case results from the care with which reputable scientists use animal experiments to predict causation in humans. This weakness arises from the fact that different species of animals react differently to the same stimuli for reasons not entirely understood.
The decisive weakness in the plaintiffs' animal studies is that the factual and theoretical bases articulated for the scientific opinions stated will not support a finding that Bendectin more probably than not caused the birth defect here. On summary judgment, under the doctrine of Celotex Corp. v. Catrett, 477 U.S. 317, 106 S.Ct. 2548, 91 L.Ed.2d 265 (1986), and Anderson v. Liberty Lobby, Inc., 477 U.S. 242, 106 S.Ct. 2505, 91 L.Ed.2d 202 (1986), the expert evidence must show the elements required for a finding of causation. Here, except for Dr. Palmer's testimony discussed below, the plaintiffs' experts stop
Dr. Palmer, a medical doctor, is the only witness who testified in his affidavit that Bendectin caused Brandy Turpin's defects. He stated:
We cannot find, however, that this testimony is anything more than a personal belief or opinion. The grounds for his opinion are subject to the same criticism as the animal studies and epidemiological reanalyses submitted by the plaintiffs' other experts: the evidence cited in support of his conclusion is insufficient to meet the plaintiffs' burden of proof. Dr. Palmer does not testify on the basis of the collective view of his scientific discipline, nor does he take issue with his peers and explain the grounds for his differences. Indeed, no understandable scientific basis is stated. Personal opinion, not science, is testifying here. Dr. Palmer's own expressed skepticism as to the value of extrapolating human conclusions from animal studies further confounds the issue. Upon analysis, we conclude that Dr. Palmer's conclusions go far beyond the known facts that form the premise for the conclusion stated. This conclusion so overstates its predicate that we hold that it cannot legitimately form the basis for a jury verdict. Beyond that Dr. Palmer's opinion testimony, to the extent that it is personal opinion as described above, is inadmissible. Fed.R.Evid. 703; see also Viterbo v. Dow Chem. Co., 826 F.2d 420, 423-24 (5th Cir.1987) (physician's unsupported personal opinion of causation held inadmissible), and Calhoun v. Honda Motor Co., 738 F.2d 126, 131-32 (6th Cir. 1984) (expert testimony must be based on the evidence, so as to be removed from the realm of guesswork and speculation).
We do not mean to intimate that animal studies lack scientific merit or power when it comes to predicting outcomes in humans. Animal studies often comprise the backbone of evidence indicating biological hazards, and their legal value has been recognized by federal courts and agencies. See, e.g., International Union, UAW v. Johnson Controls, Inc., ___ U.S. ___, 111 S.Ct. 1196, 1215, 113 L.Ed.2d 158 (White J., concurring) (citing Industrial Union Dep't v. American Petroleum Institute, 448 U.S. 607, 657 n. 64, 100 S.Ct. 2844, 2871 n. 64, 65 L.Ed.2d 1010 (1980)); Environmental Defense Fund, Inc. v. EPA, 548 F.2d 998, 1006-07 (D.C.Cir.1976); Proposed Guidelines for Assessing Female Reproductive Risk, 53 Fed.Reg. 24,834, 24,836-39 (1988) (discussing the use of animal studies to identify and assess reproductive hazards for human females); Proposed Guidelines for Assessing Male Reproductive Risk, 53 Fed.Reg. 24,850, 24,853-60 (1988) (discussing the use of animal studies to identify and assess reproductive hazards for human males).
Here, the record's explanation of the animal studies is simply inadequate. Although the animal studies themselves may have been scientifically performed, the exact nature of these tests is explained only in general terms. The record fails to make clear why the varying doses of Bendectin or doxyalamine succinate given to the rats, rabbits and in vitro animal cells would permit a jury to conclude that Bendectin more probably than not causes limb defects in children born to mothers who ingested the drug at prescribed doses during pregnancy. The analytical gap between the evidence presented and the inferences to be drawn on the ultimate issue of human birth defects is too wide. Under such circumstances,
Accordingly, the judgment of the District Court is AFFIRMED.
FootNotes
To gauge the reliability and credibility of their reports when repeated randomly, statisticians use a device known as the confidence interval. The confidence interval is not a "burden of proof" in the legal sense; rather, it is a common sense mechanism upon which statisticians rely to confirm their findings and to lend persuasive power within their profession. The confidence interval has two components: a percentage, and an interval or range. The percentage part is established by the statistician in advance of performing the studies. Frequently this percentage is set at 95 percent, although that value is somewhat arbitrary and 85 or 90 percent figures are also used. The interval, on the other hand, represents a range of possible values at high and low ends of a scale of relative risk. See, e.g., Kenneth Rothman, Modern Epidemiology 119 (1986). At a 95 percent interval the true relative risk value will be between the high and low ends of the confidence interval 95 percent of the time. See Neil Cohen, Confidence in Probability: Burdens of Persuasion in a World of Imperfect Knowledge, 60 N.Y.U.L.Rev. 385, 398-400 (1985) [hereinafter Confidence in Probability], for example of confidence intervals and their use.
To better understand confidence intervals, it may be helpful to picture a line, marked at hundredths intervals and extending from zero to infinity. The marking at 1.0 represents a relative risk of 1.0, the "null value." If a confidence interval of "95 percent between 0.8 and 3.10" is cited, this means that random repetition of the study should produce, 95 percent of the time, a relative risk somewhere between 0.8 and 3.10. Because this confidence interval includes relative risk values both less than and exceeding 1.0, the null value, a researcher cannot state that the results are statistically significant. David Kaye, Is Proof of Statistical Significance Relevant?, 61 Wash.L.Rev. 1333, 1343-44 (1986), cited in DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941, 948 (3rd Cir.1990). For an example of such a study, see Jose Cordero et al., Is Bendectin a Teratogen?, 245 JAMA 2307 (1981), cited infra. Similarly, it is possible that a range may be entirely below 1.0, meaning that the agent does not cause birth defects. If, however, the confidence interval spans a range entirely above 1.0 — e.g., from 1.75 to 10.75 — then this interval would be statistically significant and would show a greater likelihood that the suspected agent did cause the studied defect. For an example of such an interval, see Brenda Eskenazi & Michael Bracken, Bendectin (Debendox) As a Risk Factor for Pyloric Stenosis, 144 Am. J. Obstetrics & Gynecology 919 (1982), cited infra.
The sample size for any study also has an effect, both on the confidence interval and the "power" of the study. Power is the study's probability of detecting a difference in outcomes between exposed and nonexposed groups. See Office of Technology Assessment, Report No. OTA-BA-266, Reproductive Health Hazards in the Workplace 166 (1985). The higher the study's power, the stronger are its conclusions and findings regarding its outcome. If a sample population is small, however, the power of the study will likely be less. The information behind the study is less, and the confidence interval will likely span a wider range for a smaller sample group than for a larger one. The power is less for the smaller group than for the larger group, even though the confidence interval may still be set at 95 percent, because the smaller study's predictive value is lessened by a wide confidence interval range. A statistician could accordingly describe the probability of choosing an expected outcome in the larger study as being greater than in the smaller study. Cohen, Confidence in Probability, at 398-99.
As can be seen, in many aspects, the concepts of confidence intervals, sampling sizes, population and power are mutually interdependent. For an overview of how epidemiology is used in risk assessment, see Proposed Guidelines for Assessing Female Reproductive Risk, 53 Fed. Reg. 24,834, 24,840-41 (1988).
Comment
User Comments