GARRETT E. BROWN, Jr., District Judge.
This action arises out of birth defects suffered by the plaintiff Amy DeLuca. Amy DeLuca brought suit through her mother and guardian ad litem, Cindy DeLuca, who with her husband, joined as plaintiffs in their individuals capacities. Plaintiffs allege that the birth defects (limb reduction) suffered by Amy DeLuca were caused by her mother's exposure to Bendectin, an anti-nausea drug produced by the Defendant Merrell Dow Pharmaceuticals, Inc. ("Merrell Dow").
By Memorandum and Order dated June 7, 1990, this Court granted defendant Merrell Dow's motion for summary judgment.
Plaintiffs appealed and the Court of Appeals reversed and remanded the action for further proceedings. DeLuca by DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941 (3d Cir.1990). The issues presently before this Court are three: (1) whether Dr. Done's testimony should be excluded under Rule 703 because the data upon which he relies are not of a type that experts in the field of epidemiology would rely upon, id. at 952-54; (2) whether Dr. Done's testimony should be excluded from evidence under Rule 702 on the grounds that (a) his methodology is unreliable, or (b) it would overwhelm, confuse or mislead the jury, id. at 954-57; and (3) if Dr. Done's testimony is admissible, whether under applicable New Jersey law the evidence relevant to causation would permit a jury finding that Amy DeLuca's birth defects were caused by her mother's exposure to Bendectin. Id. at 957-59.
Consistent with that opinion, this Court conducted a hearing held on five separate days, followed by extensive post-hearing submissions of the parties, in order to determine whether Dr. Done's testimony is admissible under the criteria set forth in United States v. Downing, 753 F.2d 1224 (3d Cir.1985). The record before this Court consists of written direct testimony submitted by the parties and oral cross-examination and re-direct examination of the witnesses. Plaintiffs' experts were Dr. Alan K. Done, M.D. and Dr. Shanna Swan, Ph.D. Defendants submitted the expert testimony of Dr. Richard R. Monson, M.D., Sc.D, Dr. Nicholas H. Wright, M.D., M.P.H., Dr. Steven H. Lamm, M.D., Dr. Gerald A. Faich, M.D., M.P.H., Dr. Pauline Brenholz, M.D. and Dr. Paul Stanley Lietman, M.D., Ph.D. This Opinion constitutes my Findings of Fact and Conclusions of Law.
I. Causes and Incidence of Birth Defects
A. Causes of Birth Defects
1. The majority (approximately 65%) of birth defects in general are of unknown origin. Brenholz Direct at 3; Done Test., Tr. 7/10/91, at 29.
2. Possible causes of birth defects include genetic factors (chromosomal abnormalities or mutant genes) and environmental factors (diet, drug exposure, infections, x-rays and the like). Malformations may be multifactorial, meaning they are likely caused by a combination of genetic and environmental factors. Brenholz Direct at 4.
3. Genetic factors are etiological agents that initiate mechanisms of malformation by biochemical or other means at the sub-cellular level. Genetic factors typically are inherited or arise as a new gene mutation or new chromosomal abnormality, but do not necessarily manifest themselves in each pregnancy or in every generation. Brenholz Direct at 4.
4. Environmental factors, or teratogenic agents, may induce congenital malformations when the tissues and organs are developing. Brenholz Direct at 4.
5. The fact that a chromosomal study is normal does not rule out genetic defects. Medical science has not yet developed tests for most genetic defects. Brenholz Test., Tr. 9/19/91, at 78-79.
6. Family history is similarly not determinative, as recessive genes can be inherited from generation to generation, but do not manifest themselves unless there is a union with a partner who also carries the same recessive gene. There is a one in four chance that such a condition will appear. Brenholz Test., Tr. 9/19/91, at 79-80.
B. Incidence of Birth Defects
7. Congenital malformations are as old as recorded medical history. There have been countless instances of similar limb defects reported prior to Bendectin's first having been sold in this country in 1957. Since the time that Merrell Dow
8. An analysis comparing the incidence of birth defects to the sale of Bendectin has shown that there is no association between the two. Lamm Direct at 5-6; Defendant's Ex. 50.
9. Data collected by the Centers for Disease Control ("CDC") in Atlanta, Georgia show that after Bendectin ceased to be marketed (when Bendectin had been off the market for about three years) there was a slightly greater increase in birth defects than when Bendectin was prescribed in approximately 25% of all pregnancies. Wright Direct at 48; Defendant's Ex. 45.
C. Evaluation of Possible Teratogens
10. Geneticists use the Catalog of Teratogenic Agents, written by Thomas H. Shepard, M.D., as a source of reference when consulting patients as to the probable outcome of a pregnancy. In the Catalog, Dr. Shepard characterizes substances as "proven", "possible" or "unlikely" teratogens. Bendectin is listed as "unlikely" to be a teratogen. Cigarette smoking is listed as a "possible teratogen." The question of whether an induced abortion affects subsequent pregnancies has been debated. Brenholz Direct at 5-6; Defendant's Ex. 51.
11. Geneticists also use Reprotox, a computerized teratogen registry of the Reproductive Toxicology Center in Washington, D.C. Reprotox contains accurate, objective, comprehensive information regarding potential teratogenic agents and offers summaries of relevant and important articles. The registry states that animal and epidemiologic studies demonstrate no association between Bendectin and adverse pregnancy outcomes in general, and limb defects specifically. Brenholz Direct at 7; Defendant's Ex. 53.
12. Dr. Done has offered no materials on teratogenicity which state to the contrary.
13. Dr. Done has failed to explain how he himself ruled out Mrs. DeLuca's prior abortion and cigarette smoking as possible causes of Amy DeLuca's birth defects. Done Aff.; Done Test., Tr. 7/10/91; Brenholz Direct at 6-7.
14. Dr. Brenholz testified that pregnancy and smoking are always factors to be considered in determining the etiology of a birth defect. She also testified that a prior recent abortion and cigarette smoking could not be linked with a definite birth defect. Brenholz Test., Tr. 9/19/91, at 66-69.
II. Epidemiologic Studies on Bendectin
A. Principles of Epidemiology
15. Dr. Done placed emphasis in this case on epidemiologic studies, as there is almost universal agreement that the effects of drugs in human beings can best be evaluated by studying data concerning how those drugs did in fact affect persons who ingested them.
16. Plaintiffs originally relied secondarily on animal studies as well as epidemiologic studies. However, Magistrate Judge John Devine entered an order, which this Court affirmed on appeal, excluding from evidence all in vivo and in vitro animal studies. Plaintiffs have not challenged that ruling. Dr. Done also stated in his affidavit that he relied on structural activity considerations as well. His direct testimony,
17. Epidemiologic studies typically express their results in terms of relative risks or odds ratios. The relative risk or odds ratio compares the rate of disease in the exposed population to the rate of disease in the unexposed population. If the two rates are the same, then the ratio is one. Where there is no association between exposure and disease, one would expect to find studies yielding relative risks grouped around the number 1.0 — some less than 1.0 and some more than 1.0. Monson Direct at 20-22; Wright Direct at 11-12.
18. The size of a study is one measure of its stability and power; thus, other things being equal, the larger the study, the greater its strength. Monson Direct at 22-24.
19. A confidence interval is a statistical calculation which provides information as to the stability of a relative risk calculation. A 95% confidence interval means that there is a 95% probability that the "true" relative risk falls within the interval. Monson Direct at 25; Wright Direct at 18. Most epidemiologists use a 95% interval; some use a 90% interval. Wright Direct at 21.
B. The Bendectin Studies
20. Bendectin is one of the most extensively studied drugs in history. Dr. Done listed forty-two entries on Table 1 included as part of Exhibit B to his affidavit. That list contained thirty-one published studies or reports on Bendectin, six studies that did not address Bendectin, one unpublished reanalysis of an existing work, two unpublished preliminary drafts, one additional unpublished report, and one unpublished analysis of Food & Drug Administration ("FDA") "adverse drug reaction reports" or "drug experience reports" ("ADRs" or "DERs") conducted by Dr. Done himself. Done Aff., Ex. B.
21. Dr. Done conceded that there are no published studies showing a statistically significant association between Bendectin exposure and the development of limb reduction defects. Done Test., Tr. 7/10/91, at 27.
22. Dr. Done conceded that in the medical literature there is no established association to a statistically significant degree between exposure to Bendectin and an increased incidence of the specific type of Amy DeLuca's birth defects. Done Test., Tr. 7/10/91, at 29.
23. None of the authors of the thirty-one published studies included by Dr. Done on his chart concluded that a causal association between Bendectin ingestion and birth defects had been shown, and none of the published literature found any statistically significant association between Bendectin and the type of birth defects suffered by Amy DeLuca. See Plaintiffs' Exs. 1-43.
24. Dr. Done conceded that the authors of the published epidemiologic literature have concluded that their studies failed to demonstrate an association between Bendectin and limb reduction defects. Done Test., Tr. 7/10/91, at 27.
25. In 1982 and 1983, the FDA conducted a detailed review of Bendectin and released a report, continuing to approve the sale of Bendectin. The FDA concluded: "We do not however, believe available information supports a conclusion that Bendectin is teratogenic in humans." Defendant's Ex. 20, at 34.
26. In 1984, the court-appointed expert in the multi-district Bendectin litigation concluded that there was no evidence upon which to conclude that Bendectin caused birth defects. Lamm Direct at 4-5.
27. In the Notice of Proposed Rulemaking included in the Federal Register at Volume 52, No. 163 (August 24, 1987), the FDA concluded: "The agency has reviewed extensive data concerning the possible teratogenicity of doxylamine succinate and concludes that it is unlikely that this ingredient is a teratogen." Defendant's Ex. 19, at 31905.
28. Dr. Done concluded, based on his methodology, that a grouping of the data from all studies conclusively established that there was in fact an association between
III. The Done Methodology: Dr. Done's Reanalysis of Epidemiologic Studies
29. In his affidavit, Dr. Done stated the bases for his conclusion that Bendectin caused the congenital limb defect suffered by Amy DeLuca were his "knowledge of the properties of Bendectin, the compatibility of the timing of Bendectin exposure in this case with the particular defect, the fact that the defect is of the type for which there is substantial evidence of Bendectin causation, the absence of another more likely cause in this case, evidence of teratogenicity of Bendectin from animal teratogenicity and in-vitro mechanistic studies, and the ample evidence of human teratogenicity of Bendectin from the epidemiologic studies analyzed in Exhibit "B"." Done Aff. at 2. With the exception of epidemiologic studies, Dr. Done provided no supporting data or explanations for the other bases upon which he relied.
30. In his affidavit, Dr. Done stated: "Proof of causation can never come from epidemiologic studies, even with extensive replication." Done Aff. at 3.
A. Dr. Done's Data Sheets
31. Dr. Done purports to have taken the numbers he entered in the boxes on his chart from either the underlying studies themselves, or in the articles where no calculations were made, Dr. Done claims to have calculated the numbers himself. In many cases, this is simply not true.
32. Dr. Shanna Swan, plaintiffs' other expert, did not independently verify the data included on Dr. Done's chart, did not check his calculations and did not check to see if Dr. Done correctly extracted the data from the articles as to which he made no calculations. Swan Test., Tr. 7/12/91, at 7-14.
B. Dr. Done's Methodology of Calculation
33. The calculation of a relative risk is a simple arithmetic calculation of the rate of occurrence in the exposed population compared to the rate of occurrence in the unexposed population. Done Test., Tr. 7/10/91, at 142; Wright Direct at 11; Monson Direct at 21-22. Relative risk calculations are not subject to variations other than those attributable to round-off techniques, provided the correct data is selected. Monson Direct at 11. The calculation of confidence intervals, on the other hand, is a more difficult calculation and may vary depending on the type of computer program used. Monson Direct at 11; Done Aff., Ex. B.
34. During cross-examination, Dr. Done explained how he obtained the data he used in making the calculations he entered on the chart. Done Test., Tr. 7/10/91, at 114-34. With respect to the Newman and Greenberg studies, Dr. Done admitted that the numbers were transposed. Done Test., Tr. 7/10/91, at 130.
35. Defense experts, Drs. Monson, Wright and Lamm, have itemized numerous errors made by Dr. Done in the calculation of the "data sets"
36. Drs. Monson, Wright, Lamm and Swan, all qualified epidemiologists, in many cases could not replicate Dr. Done's recalculations. Monson Direct at 27-28; Wright Direct at 33; Lamm Direct at 15-18; Swan Test., Tr. 7/12/91, at 12, 20, 25-26, 28-29 & 33.
37. Dr. Done's chart listed a relative risk of 8.8 with confidence intervals of 2.0 to 3.9 with respect to limb reduction defects for the Jick '80 study (fourth entry on Dr. Done's chart). Done Aff., Ex. B. This is incorrect since the relative risk must fall between the confidence limits. Done Test., Tr. 7/10/91, at 93-95; Swan Test., Tr. 7/12/91, at 11-12.
38. The Shapiro and Heinonen publications of the same study presented a standardized relative risk of 1.15 for musculoskeletal malformations. Plaintiffs' Ex. 13, Table 2, at 482; Plaintiffs' Ex. 14, Table 23.5, at 327. Dr. Done's chart listed two separate non-standardized
39. Dr. Done calculated the 1.6 figure in the Shapiro '77 study by adding club feet to the author's category of musculoskeletal defects. Done Test., Tr. 7/10/91, at 122-25; Monson Direct at 14.
40. Although Dr. Swan listed a relative risk of 0.67 to 1.0 in her reanalysis for women under the age of thirty who ingested the same two-ingredient Bendectin as did Mrs. DeLuca, Dr. Done listed a relative risk of 2.4 on his chart. Compare Done Aff., Ex. B. with Plaintiffs' Ex. 28. Dr. Swan admitted that she would not place much weight on this portion of her reanalysis because the numbers are so small. Swan Test., Tr. 7/12/91, at 44.
41. The Bannister study did not contain adequate information on control subjects to calculate a relative risk. Wright Direct at 37. Even if one were to make certain assumptions regarding the data, there is no known methodology which could yield a relative risk of 13, the number which Dr. Done entered on his chart. Monson Direct at 12; Lamm Direct at 16. Dr. Done testified that he calculated the relative risk of 13 by taking 3/23 as the numerator and dividing by 1/28. Done Test., Tr. 7/10/91, at 116-18. Simple arithmetic reveals, however, that 3/23 divided by 1/28 equals 3.65, not 13. Possible relative risks are either 1.2 or 1.4. Monson Direct at 12-13. Dr. Done previously testified in another trial that the relative risk was 1.4. Done Test., Tr. 7/10/91, at 118-19.
42. Dr. Swan could not explain how Dr. Done calculated a relative risk of 13 for the Bannister study. Swan Test., Tr. 7/12/91, at 20-26.
43. For the Saxen study, Dr. Done listed a relative risk of 4.6 for all defects. Done Aff., Ex. B; Plaintiffs' Ex. 8. The data for all defects were not presented in the underlying study. Lamm Direct at 16; Defendant's Ex. 48. Dr. Done explained, however, that the "all" category included all defects of other kinds occurring in people who also had cleft lip or palate. Done Test., Tr. 7/10/91, at 97.
45. Dr. Swan could not give an opinion as to whether Dr. Done used an accepted methodology in calculating his relative risk of 11.4. Swan Test., Tr. 7/12/91, at 34-36.
46. With respect to the Nelson study, Dr. Done calculated a relative risk of 7.0 for the all defects category, a number arrived at by adding one defect to the unexposed group (since there were no actual defects reported and dividing by zero would yield infinity). Done Test., Tr. 7/10/91, at 115-16.
47. Dr. Monson testified that there are no choices of data which would yield a relative risk of 7.0. Monson Direct at 27; Defendant's Ex. 48.
48. Dr. Swan could not confirm Dr. Done's relative risk of 7.0 and in fact herself calculated a relative risk of 0.91. Swan Test., Tr. 7/12/91, at 29.
49. Dr. Done admitted that he transposed the entries for the Newman and Greenberg studies on his chart and so noticed during cross-examination. Done Test., Tr. 7/10/91, at 130-32.
50. The odds ratio of 3.3 reported by Dr. Done for limb reduction defects in the Kullander study related to women who took promethazine (an ingredient not found in Bendectin) and only to children with congenital dysplasia of the hip (a condition Amy DeLuca does not have). Lamm Direct at 16-17.
51. Dr. Wright similarly pointed out that the Kullander study dealt with the study of all antiemetics and not with Bendectin at all. Wright Direct at 29.
52. Dr. Done calculated an odds ratio of 2.0 for the all defects category in the Smith study. Although Dr. Done claimed to have compared the normal controls with the anomalous controls, Done Test., Tr. 7/10/91, at 96, Dr. Lamm testified that there were no data from which to make this calculation in the underlying study. Lamm Direct at 17; Monson Direct at 27; Defendant's Ex. 48. In addition, Dr. Lamm testified that the Smith study did not specifically address Bendectin and should have been excluded. Lamm Direct at 17; Monson Direct at 27; Defendant's Ex. 48.
53. The Porter study gave an odds ratio of 1.83 for malformations and Bendectin, Plaintiffs' Ex. 43, at 1430 Table 3, while Dr. Done recalculated an odds ratio of 2.3 for all defects. Done Test., Tr. 7/10/91, at 134. Dr. Lamm confirmed the author's odds ratio of 1.83 and did not know how Dr. Done calculated the number 2.3. Lamm Direct at 18; Wright Direct at 27-28.
54. Dr. Swan could not confirm Dr. Done's calculation of 2.3, but rather herself calculated an odds ratio of 1.82 using unmatched controls. Swan Test., Tr. 7/12/91, at 20.
C. Methodology of Presentation
55. Dr. Done presented percentage numbers of studies that are compatible with an increased risk of birth defects, but failed to inform the reader that the same studies were also compatible with a decreased risk of birth defects. Wright Direct at 34-35.
i. Inclusion of Data
56. Although Dr. Done has prepared an amended graph excluding those studies which did not address Bendectin, his original chart included studies which did not specifically address Bendectin. Done Aff., Ex. B.; Plaintiffs' Exs. 4, 6, 8, 9, 11 & 12; Monson Direct at 5; Wright Direct at 6; Lamm Direct at 12.
57. Dr. Swan, plaintiffs' other expert, agreed with defendant's experts that studies not specifically addressing Bendectin should have been excluded from any consideration of Bendectin's alleged teratogenicity. Swan Aff., ¶ 3(c).
58. Dr. Done admitted that exclusion of non-Bendectin studies would be more reliable in considering the risks of Bendectin. Done Test., Tr. 7/10/91, at 90.
60. Dr. Done included the Shapiro/Heinonen study twice, although it is only one single study printed in two separate versions. Plaintiffs' Exs. 13 & 14; Monson Direct at 6; Defendant's Ex. 46. He also listed different relative risks for limb defects, although the two papers were based on the same data. Done Aff., Ex. B. Dr. Done claims to have separated out limb reduction defects. Done Test., Tr. 7/10/91, at 121-26.
61. Dr. Done included on his chart a number of preliminary drafts that were later replaced by finalized published studies. Compare Plaintiffs' Ex. 19 with Plaintiffs' Ex. 31; Compare Plaintiffs' Ex. 20 with Plaintiffs' Ex. 39; Monson Direct at 6-8.
62. The Jick draft included by Dr. Done on his chart was specifically labelled as a preliminary draft by the author himself. Plaintiffs' Ex. 20 (emphasis in original); Wright Direct at 32-33. A subsequent affidavit submitted by Dr. Jick explained that his preliminary draft contained errors that had a tendency to create a bias and those errors were later corrected. Defendant's Ex. 49. Dr. Done admitted the unreliability of the Jick draft. Done Test., Tr. 7/10/91, at 130.
63. Dr. Done relied on preliminary drafts in spite of his admission that his own earlier reanalysis submitted at his deposition contained numerous errors that were corrected in his final version. Done Test., Tr. 7/10/91, at 53-54.
64. Dr. Done's earlier draft contained variances in "data sets" for at least twenty of the studies on his chart. Done Test., Tr. 7/10/91, at 53-80. Dr. Done attributed the changes on his chart to the fact that he obtained a computer in the interim, obtained additional studies, etc. Done Test., Tr. 7/10/91, at 78-80.
65. Neither Dr. Swan's reanalysis of the Cordero data nor Dr. Done's analysis of DERs or ADRs has ever been submitted for publication. Monson Direct at 7. Neither Dr. Swan nor Dr. Done provided any explanation for not having published their respective analyses.
66. Dr. Done entered a relative risk of 2.0 for limb defects in the Michaelis '83 study, but there were no data given in that article from which to make such a calculation. Monson Direct at 18; Plaintiffs' Ex. 31.
67. Dr. Done calculated a relative risk of 3.3 for musculoskeletal defects for the Kullander study, which did not address Bendectin specifically and did not separately break out musculoskeletal defects. Wright Direct at 29; Plaintiffs' Ex. 11.
68. Dr. Done's review of DER surveillance data did not constitute a finished epidemiologic study. Wright Direct at 33. It cannot be used by itself to prove causation, but rather is merely a stimulus for further study. Done Test., Tr. 7/10/91, at 101.
69. With respect to Dr. Done's DER analysis (or ADR analysis), even if ADR or DER information were accurately reported, ADRs have inherent biases as they are second-or-third hand reports, are affected by medical or mass media attention, and are subject to other distortions. Faich Direct at 5.
70. At the time when Dr. Done reviewed the ADRs for Bendectin, there were inaccuracies, inconsistencies and duplications in the FDA records. Faich Direct at 6-7.
71. Dr. Done conceded that DERs generally can include lawsuits, medical journal articles and news accounts, and that the reporting of DERs by physicians is far more complete and total, and in fact less than 10% of what should be reported is in fact reported. Done Test., Tr. 7/10/91, at 99-101.
73. Dr. Done further admitted that he cannot now reproduce lists of the DERs he reviewed, and thus, cannot verify his data. Done Test., Tr. 7/10/91, at 101-02.
74. An FDA report has concluded that "ADR reports have in fact not been useful beyond a role as stimulators of interest. Defendant's Ex. 20, at ¶ 2.
75. ADRs or DERs are not of a type of data that are reasonably relied upon by experts in the fields of epidemiology and public health to make a determination of the causal relationship between a given substance and human birth defects. Faich Direct at 6 & 10.
ii. Exclusion of Data
76. Dr. Done did not include data from a large recent study by Shiono, which concluded there was no demonstrated association between Bendectin and birth defects. Done Aff., Ex. B.; Lamm Direct at 11; Wright Direct at 10-11; Defendant's Ex. 38.
77. Dr. Done also did not include any data from the 1991 Erickson article, which yielded an odds ratio of 0.87 for all defects. Wright Direct at 10-11; Defendant's Ex. 40, Table 3, at 46.
78. Dr. Done did not know of any post-1986 studies such as Shiono and Erickson, as he testified that he included all he knew about. Done Test., Tr. 7/10/91, at 67. Both articles were published in Teratology, a peer-reviewed scientific journal. Defendant's Exs. 38 & 40.
79. Dr. Done did not include on his chart the relative risk for limb defects (the type of defects suffered by Amy DeLuca) from the Gibson study, even though it was presented in the published article. Done Aff., Ex. B., Lamm Direct at 11; Plaintiffs' Ex. 25.
80. Dr. Swan confirmed that the Gibson article contained data for a relative risk of 0.84 for limb reduction defects, although the relative risk was not included by Dr. Done on his chart. Swan Test., Tr. 7/12/91, at 29-31.
iii. Selectivity of Data
81. Although the Jick '80 draft, Plaintiffs' Ex. 20, contained at least four data sets, Dr. Done, with no explanation, ignored the one most favorable to a lack of association. Monson Direct at 14-16; Done Aff., Ex. B.
82. Dr. Done testified that he included club feet in the Heinonen study, thus raising the non-standardized relative risk from 1.4 to 1.6. Done Test., Tr. 7/10/91, at 124-25. Dr. Done did not, however, use that same method and similarly include club feet in deriving the relative risk of 4.2
83. In the Aselton '83 study, Plaintiffs' Ex. 36, the possible relative risks from the two data sets are 1.4 and 0.67, or a combined value of 0.9. Dr. Done selected only the highest value of 1.4 for his chart. Monson Direct at 19; Wright Direct at 39.
iv. Weighing of Data
84. One important consideration in analyzing a study's power is the number of exposed defects. Monson Direct at 23. The study having the largest number of exposed defects is the McCredie study. Monson Direct at 23; Defendant's Ex. 47. The relative risk for this study is 1.1, an almost perfect example of a lack of an association. Dr. Done does not give this study any more weight than any of the other studies.
85. The study showing the second largest number of exposed limb defects is the Gibson study. Defendant's Ex. 47. Dr.
86. The next two strongest studies in terms of the number of exposed defects are Cordero and Shapiro/Heinonen, with exposed defects of 43, 24, 14 and 13. Cordero listed a relative risk of 1.2, and the Shapiro/Heinonen study listed a standardized relative risk of 1.15 and a non-standardized relative risk of 1.4. Monson Direct at 24. Dr. Done has not given these studies any more weight than the Eskenazi or Aselton '85 studies, which have only one exposed defect in each study. Monson Direct at 24.
87. The 106 data sets presented in Dr. Done's chart do not represent 106 studies, but rather calculations from approximately forty individual studies. If a particular study calculates or presents information for a number of different classes of birth defects, then Dr. Done considers that to be that number of different data sets. Thus, a weak study in which the author presents data on five different categories of birth defects will be given five times as much weight on Dr. Done's chart as a study where the author evaluated only one type of birth defect. Wright Direct at 30.
88. Dr. Done gave equal weight to his data and calculations concerning birth defects not at issue here and to data concerning limb reduction defects at issue here. Wright Direct at 32; Done Aff., Ex. B.
89. There is nothing in the record indicating that Dr. Done has considered study design or control for bias and confounding, although he indirectly considered statistical power through presentation of confidence intervals. Dr. Swan confirmed that she did not know whether Dr. Done considered study design and control for bias and confounding. Swan Test., Tr. 7/12/91, at 48-49.
90. Dr. Done espoused collective evaluation of data, that is, re-evaluating all the underlying data he gathered from the studies. Done Aff., Ex. B. Dr. Done, however, reached no quantitative conclusion concerning his re-evaluation.
91. One possible quantitative collective conclusion, suggested by Dr. Lamm, is a combined relative risk called a "pooled odds ratio" calculated according to the Mantel-Haenszel statistical method. The pooled odds ratio calculated by Dr. Lamm for limb reduction defects is 1.08, and for all malformations is 0.98. Lamm Direct at 14. These results demonstrate an absence of an association between limb reduction defects or all malformations and maternal ingestion of Bendectin.
v. Presentation of Confidence Intervals
92. A study having a smaller confidence interval has more value than a study with a larger confidence interval. Done Test., Tr. 7/10/91, at 30.
93. Large confidence intervals show up as more black on Dr. Done's bar graph. This gives the reader the erroneous impression that the studies with large confidence intervals should be given more weight. Wright Direct at 46. The length of the line is, however, a measure of the lack of precision of a study. Lamm Test., Tr. 10/9/91, at 27.
94. In the analysis of Bendectin limb defect studies, the choice of a confidence interval of 90% or 95% does not change the result if that confidence interval contains the number 1.0. Wright Direct at 22; Defendant's Ex. 47.
95. Dr. Done's bar graph of confidence intervals mixed 90% and 95% confidence intervals without designating which were which. When presented visually on a bar graph, a 95% confidence interval will always be wider than a 90% confidence interval for the same study. This leads to an unclear and inconsistent presentation of data. Wright Direct at 21-22.
96. Dr. Done referred to studies in which the confidence intervals included 1.0 as "inconclusive." Although technically accurate, use of the word inconclusive may suggest to those unfamiliar with statistics and epidemiology that there was some deficiency in the study that makes one unable to form any conclusions whatsoever.
97. Dr. Done's statement that 70% of his data sets have an upper confidence level above 2.0 is misleading without the corresponding information that 94% have lower confidence limits below 2.0 and only 30% of his data sets have a relative risk greater than 2.0. Wright Direct at 35-36.
vi. Inconsistencies Between Graphs and Chart
98. The bar graphs do not accurately portray all of the information contained on Dr. Done's chart.
99. Dr. Done's chart contained an entry for the Eskenazi study showing a relative risk of 4.2 and a confidence interval of 0.5 to 36 for limb reduction defects. There was no entry on Dr. Done's bar graph corresponding to this data. Wright Direct at 7; Compare Done Chart with Done Bar Graph. The closest entry ranges from 0.5 to 44, thus skewing the data in favor of Dr. Done's opinion. Wright Direct at 6-7.
100. There was similarly no line accurately depicting Dr. Done's calculation for diaphragmatic hernia in the Heinonen study. Wright Direct at 8-9. Dr. Done's chart listed a relative risk of 11.0 with confidence limits of 4.3 to 28 for the Heinonen study. The corresponding line on the bar graph accurately portrayed the lower confidence limit, but exaggerated the upper confidence limit to well over 30. Compare Done Chart with Done Bar Graph.
101. At least one of the calculations in the Jick '80 draft included on Dr. Done's chart was not accurately portrayed on his bar graph. Wright Direct at 8. Dr. Done's chart listed a relative risk of 8.8 with confidence limits of 2.0 to 3.9 for limb reduction defects. While such a calculation is inaccurate (because 8.8 does not fall between 2.0 and 3.9), the corresponding line on the bar graph runs from a little under 2.0 to approximately 48. Compare Done Chart with Done Bar Graph.
102. Dr. Swan did not know which data points on Dr. Done's chart corresponded to which lines on his bar graph. Swan Test., Tr. 7/12/91, at 36-37.
103. On Dr. Done's own chart purporting to rank the studies by upper limit of confidence interval, Defendant's Ex. 42, there was no number 13, two numbers 14 (Jick '81 — all defects and Cordero '81 — CNS), two numbers 68 (Shapiro '77 — eye and ear and Michaelis '83 — musculoskeletal), no number 69, two numbers 82 (Saxen '74 — all and Eskenazi '82 — esophageal), no number 85, and no number 94. Wright Direct at 9.
D. Dr. Done's Qualifications
104. Dr. Done has had no formal training in a degree program in epidemiology. Done Test., Tr. 7/10/91, at 12.
105. Dr. Done does not have a Bachelor or Doctorate degree in epidemiology nor has he undertaken any fellowship in epidemiology. Done Test., Tr. 7/10/91, at 12.
106. Dr. Done is not a member of the American College of Epidemiology or the Society for Epidemiologic Research. Done Test., Tr. 7/10/91, at 12.
107. Dr. Done is not board certified in obstetrics or genetics. Although he is board certified in pediatrics, Done Test., Tr. 7/10/91, at 7, Dr. Done has never had a private practice in pediatrics, and he has not had admitting privileges in any hospital since 1983. Done Test., Tr. 7/10/91, at 8 & 13.
E. Dr. Done's Conclusions and Worksheets
108. Although Dr. Done stated in his report that "92% of the studies are compatible with an increase," he did not mention in his report that the studies were also compatible with a decrease (a proposition which he readily admits). Done Test., Tr. 7/10/91, at 46.
110. On one of Dr. Done's charts, he ranked the studies by length of confidence interval as a measure of the strength of the data — the shortest confidence intervals indicating the strongest data. Defendant's Ex. 42; Wright Direct at 15.
111. The studies Dr. Done considered to be the strongest all yielded relative risks grouped around the number 1.0. Wright Direct at 15.
112. With respect to the all defects category, Dr. Done rated the Shapiro study as the strongest study, followed by the Gibson study and the GPRG study. Defendant's Ex. 42; Wright Direct at 15. The relative risks for these studies are 1.1, 1.1 and 0.7 respectively. Done Aff., Ex. B.
113. In the category of limb reduction defects, putting aside Dr. Done's own unpublished preliminary study, Dr. Done rated the Shapiro study as the strongest with a rating of 82, followed by the Cordero study and the McCredie study. Defendant's Ex. 42; Wright Direct at 15-16. The relative risks for these studies are 1.15 (according to the author, 1.6 according to Dr. Done), 1.2 and 1.1. Done Aff., Ex. B.
IV. The Done Methodology: Dr. Done's Structure Activity Theory
114. Dr. Done claims that (1) because the structure of Bendectin is chemically similar to the structure of antihistamines generally; and (2) because some antihistamines have been associated with birth defects; and (3) because Bendectin is an antihistamine, all are suggestive of a connection between Bendectin and birth defects. Dr. Done has offered no evidence or data to support the above mentioned theory.
115. Small changes in chemical structure can cause very different human effects. Lietman Direct at 10. For example, methanol and ethanol are chemically similar. Compare Defendant's Ex. 59 with Ex. 60. Yet, while ethanol is found in most alcoholic beverages, ingestion of methanol produces blindness. Lietman Direct at 10-11.
116. Experts in pharmacology do not rely upon structure activity relationships in making determinations as to causation of birth defects, as such information is highly unreliable for drawing conclusions about teratogenicity. Lietman Direct at 14.
V. Exclusion of Other Causes of Amy DeLuca's Birth Defects
117. In Dr. Done's affidavit, he stated that his opinion was based in part on "the absence of another more likely cause in this case." Done Aff. at 2.
118. There is no evidence in the record showing that Dr. Done himself ruled out any other likely cause of Amy DeLuca's birth defects.
119. Although chromosomal studies in this case proved negative, Dr. Done did not address the possibility of causation from a genetic standpoint.
120. Although Dr. Brenholz testified that she has no direct evidence that smoking or Mrs. DeLuca's prior abortion were in any way related to Amy's birth defects, Brenholz Test., Tr. 9/19/91, at 67-68, the standardized texts and literature consulted by Dr. Brenholz in her practice list smoking as a possible teratogen and abortion as a subject of debate. Brenholz Direct at 6-7. In his testimony, Dr. Done failed to address either as another more likely cause of Amy's birth defects, did not consider any other cause, and did not explain his reasoning for ruling out any other cause. Brenholz Direct at 6-7.
CONCLUSIONS OF LAW
1. This Court has diversity jurisdiction over this action pursuant to 28 U.S.C. § 1332.
II. The Downing Hearing
2. The hearing conducted by this Court on remand has produced a record sufficient under the criteria set forth in United States v. Downing, 753 F.2d 1224 (3d Cir. 1985) and DeLuca by DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941 (3d Cir.1990), to enable this Court to reach "the ultimate determination of whether [Dr. Done's testimony] is `helpful' and thus admissible," and to exercise its discretion in making that determination. DeLuca, 911 F.2d at 957 (citing United States v. Ferri, 778 F.2d 985, 989-91 (3d Cir.1985), cert. denied, 476 U.S. 1172, 106 S.Ct. 2896, 90 L.Ed.2d 983 (1986)).
III. Federal Rule of Evidence 702
3. Federal Rule of Evidence 702 provides:
4. Rule 702 embodies a strong and undeniable preference for admission of evidence having some potential to assist the trier of fact. DeLuca, 911 F.2d at 956.
5. Before a witness may testify under Rule 702, he or she must be sufficiently qualified.
6. Rule 702 permits the admission of expert testimony so long as that testimony is rendered by a qualified expert and is helpful to the trier of fact. DeLuca, 911 F.2d at 954 (citing American Technology Resources v. United States, 893 F.2d 651, 655 (3d Cir.), cert. denied, 495 U.S. 933, 110 S.Ct. 2176, 109 L.Ed.2d 505 (1990); Habecker v. Copperloy Corp., 893 F.2d 49, 51 (3d Cir.1990); Breidor v. Sears, Roebuck & Co., 722 F.2d 1134, 1138-39 (3d Cir.1983)).
7. Expert testimony that is based on unreliable methodology is unhelpful and thus excludable. DeLuca, 911 F.2d at 954; Downing, 753 F.2d at 1224.
8. "The reliability of expert testimony founded on reasoning from epidemiological data is generally a fit subject for judicial notice." DeLuca, 911 F.2d at 954.
9. Where expert testimony is based on arguably unreliable techniques or novel scientific evidence, courts must conduct a preliminary inquiry, focusing on (1) the soundness and reliability of the process or technique used in generating the evidence; (2) the possibility that admitting the evidence would overwhelm, confuse, or mislead the jury; and (3) the proffered connection between the scientific research or test result to be presented, and particular disputed factual issues in the case.
10. Where proffered statistical analysis is not novel, a Rule 702 hearing is not necessary to evaluate its reliability. DeLuca, 911 F.2d at 955 n. 15.
11. Simply because the great weight of scientific opinion leans against another opinion does not justify its exclusion. The degree to which adverse opinion dominates the relevant literature, however, is not wholly irrelevant. DeLuca, 911 F.2d at 955; Downing, 753 F.2d at 1238. Opinion contrary to Dr. Done's opinion that Bendectin is a teratogen clearly dominates the relevant literature, which almost universally finds no evidence of an increased risk of birth defects in conjunction with the use of Bendectin. See Finding of Facts supra at ¶¶ 20-28; see also DeLuca, 911 F.2d at 945-46 ("The great weight of scientific opinion, as evidenced by the FDA committee results, sides with the view that Bendectin use does not increase the risk of having a child with birth defects. Sailing
A. The Process and Techniques Used by Dr. Done in Generating the Evidence
12. The court in Downing identified several factors to be considered when evaluating the reliability and soundness of a particular technique or methodology: (i) the novelty of the technique and its relationship to more established modes of scientific analysis; (ii) the existence of specialized literature; (iii) the qualifications and professional stature of expert witnesses; (iv) the non-judicial uses to which the scientific technique are put; and (v) the frequency with which a technique leads to erroneous results. Downing, 753 F.2d at 1238-39.
i. The Novelty of Dr. Done's Methodology and Its Relationship to More Established Modes of Scientific Analysis
13. Although the calculation of a relative risk is a simple arithmetic calculation, Done Test., Tr. 7/10/91, at 142; Wright Direct at 11; Monson Direct at 21-22, the defense experts as well as plaintiff's co-expert, Dr. Swan, testified that the precise method used by Dr. Done in making some of his calculations was a mystery and was not in conformance with any known methodology. Drs. Monson, Wright, Lamm and Swan, all qualified epidemiologists, in many cases could not replicate Dr. Done's calculations. Monson Direct at 27-28; Wright Direct at 33; Lamm Direct at 15-18; Swan Test., Tr. 7/12/91, at 12, 20, 25-26, 28-29 & 33. Accordingly, this Court is uncertain as to the precise technique or methodology employed by Dr. Done in making his calculations which he included on his chart, and thus, I conclude that his methodology is indeed novel. This factor weighs against admissibility.
14. The strength of an epidemiologic study should be evaluated in addition to its results. DeLuca, 911 F.2d at 955. Among the strongest studies on limb defects were the McCredie and Gibson studies, those being the two studies with the greatest number of exposed defects. See Findings of Fact supra at ¶¶ 84-85. Those studies yielded relative risks of 1.1 and 0.8 respectively. The next two strongest studies in terms of the number of exposed defects yielded relative risks of 1.2 and 1.15. See Findings of Fact supra at ¶ 86. Dr. Done's failure to weight these studies more strongly weighs against admissibility.
ii. The Existence of Specialized Literature
15. Dr. Done has not identified any specialized literature endorsing his particular methodology, nor has his methodology been endorsed by other experts. Accordingly, there is no likelihood that Dr. Done's methodology or technique has been exposed to critical scientific scrutiny. Downing, 753 F.2d at 1238-39. This factor weighs against admissibility.
iii. The Qualifications and Professional Stature of Expert Witnesses
16. Although Rule 702 establishes a liberal qualification standard, In re Paoli R.R. Yard PCB Litig., 916 F.2d 829, 856 (3d Cir.1990), cert. denied sub nom. General Elec. Co. v. Knight, ___ U.S. ___, 111 S.Ct. 1584, 113 L.Ed.2d 649 (1991), this Court may nonetheless consider Dr. Done's qualifications in connection with the reliability of his methodology. Downing, 753 F.2d at 1239.
17. Dr. Done primarily relies upon the field of epidemiology in formulating his opinion. It is undisputed that Dr. Done has had no formal training in a degree program in epidemiology. Done Test., Tr. 7/10/91, at 12. It is further undisputed that Dr. Done does not have a Bachelor or Doctorate degree in epidemiology nor has he undertaken any fellowship in epidemiology. Done Test., Tr. 7/10/91, at 12. In addition, Dr. Done is not a member of the American College of Epidemiology or the Society for Epidemiologic Research. Done Test., Tr. 7/10/91, at 12.
19. Similarly, defense experts, Drs. Monson, Wright and Lamm, all qualified epidemiologists, in many cases could not replicate Dr. Done's calculations. Monson Direct at 27-28; Wright Direct at 33; Lamm Direct at 15-18.
20. Although Dr. Done's qualifications or lack thereof may not alone constitute a sufficient basis upon which to exclude his testimony, his lack of formal training in epidemiology weighs against admissibility.
iv. The Non-Judicial Uses to Which the Scientific Technique Are Put
21. Dr. Done has presented no evidence that his methodology has been put to any non-judicial use. Although "the Federal Rules of Evidence contain no requirement that an expert's testimony be based upon reasoning subjected to peer-review and published in the professional literature," DeLuca, 911 F.2d at 954, the fact that Dr. Done's methodology has not been used non-judicially weighs against its admissibility. Downing, 753 F.2d at 1239; see also Perry v. United States, 755 F.2d 888, 892 (11th Cir.1985) ("the examination of a scientific study by a cadre of lawyers is not the same as its examination by others trained in the field of science or medicine").
v. The Frequency with Which a Technique Leads to Erroneous Results
22. The "ultimate touchstone" of the soundness and reliability of a particular methodology or technique "is helpfulness to the trier of fact." DeLuca, 911 F.2d at 956. Helpfulness "turns on whether the expert's `technique or principle [is] sufficiently reliable so that it will aid the jury in reaching accurate results.'" Id. at 956 (quoting 3 J. Weinstein & M. Berger, Weinstein's Evidence ¶ 702, at 702-35 (1988)). In this regard, "Downing teaches that the frequency with which a scientific technique leads to erroneous results bears heavily on its reliability for evidential purposes." DeLuca, 911 F.2d at 956 n. 19 (citing Downing, 753 F.2d at 1239).
23. Dr. Swan, plaintiffs' other expert, did not independently verify the data included on Dr. Done's chart, did not check his calculations and did not check to see if Dr. Done correctly extracted the data from the articles as to which he made no calculations. Swan Test., Tr. 7/12/91, at 7-14.
24. To the extent that Dr. Done used ADR or DER data as a basis for his conclusion that Bendectin is a teratogen, the methodology produces inaccurate and unreliable results because such data are unreliable for determining causation. See Findings of Fact supra at ¶¶ 69-75.
25. The testimony on remand established that Dr. Done's epidemiologic methodology yielded erroneous results so frequently that it is not helpful to the trier of fact. See Findings of Fact supra at ¶¶ 34-54. This factor most certainly weighs against admissibility.
26. Plaintiffs must "make more than a prima facie showing (e.g., the testimony of a single qualified expert) that a technique is reliable." Downing, 753 F.2d at 1240 n. 21. Plaintiffs have failed to make even a prima facie showing that Dr. Done's methodology is reliable.
27. The above Downing factors used in evaluating the reliability and soundness of Dr. Done's methodology weigh against the admissibility of his testimony.
B. The Possibility That Admitting the Evidence Would Overwhelm, Confuse, or Mislead the Jury
28. Testimony also should be excluded if it would overwhelm, confuse or mislead
29. Although Dr. Done's inclusion and exclusion of certain data may be nothing more than a matter for the experts to battle, when viewed in light of the numerous errors in calculation of that data and selectivity biases, this Court concludes that Dr. Done's testimony would serve to confuse and mislead a jury.
30. There is a danger that scientific evidence will mislead a jury "where the jury is not presented with the data on which the expert relies, but must instead accept the expert's assertions as to the accuracy of his conclusions." Downing, 753 F.2d at 1239. Here, Dr. Done has presented on his charts the post-selection and post-calculation numbers, without explaining the precise derivation of the numbers. Moreover, Dr. Done has not specifically ruled out Mrs. DeLuca's cigarette smoking or her prior abortion as possible causes of Amy's birth defects. The potential for confusion is evident.
31. Because I conclude that Dr. Done's methodology is novel and unreliable, and because his testimony has the potential of confusing and misleading a jury, Federal Rule of Evidence 702 requires its exclusion.
IV. Federal Rule of Evidence 703
32. Federal Rule of Evidence 703 states:
33. Rule 703 and Rule 104(a) require a district court to "`make a factual inquiry ... as to what data experts in the field find reliable.'" DeLuca, 911 F.2d at 952 (quoting In re Japanese Elec. Prods. Antitrust Litig., 723 F.2d 238, 276 (3d Cir.1983), rev'd on other grounds sub nom. Matsushita Elec. Indus. Co., Ltd. v. Zenith Radio Corp., 475 U.S. 574, 106 S.Ct. 1348, 89 L.Ed.2d 538 (1986)). Generally, if an expert avers that his testimony is based on data experts in the field find reliable, then Rule 703's requirements are usually satisfied. DeLuca, 911 F.2d at 952.
34. "Rule 703 is satisfied once there is a showing that an expert's testimony is based on the type of data a reasonable expert in the field would use in rendering an opinion on the subject at issue; it does not address the reliability or general acceptance of an expert's methodology" as does Rule 702. Id. at 953.
35. Even if expert testimony is admissible under Rule 703, it may be excluded as unreliable under Rule 702.
36. Dr. Done purports to have taken the numbers he entered in the boxes on his chart from either the underlying studies themselves, or, in the articles where no calculations were made, from his own calculations. In many cases, as previously noted, this is simply not true. Dr. Done frequently used numbers of his own, even where calculations were available. Often, his "re-calculations" were wrong and could not be replicated by plaintiff's other expert, Dr. Swan, or the defense experts. See Findings of Fact supra at ¶ 36. The hearing conducted by this Court has revealed that the data used by Dr. Done is not, as was represented to the Court of Appeals,
37. In addition, Dr. Done specifically relied upon several types of data experts in the field would not use in forming their opinions: (i) his own analysis of the ADRs or DERs for Bendectin; (ii) the preliminary Jick drafts, as well as other drafts, see Findings of Fact supra at ¶ 61; and (iii) Dr. Swan's reanalysis of the Cordero data.
38. Dr. Done thus has used data upon which no epidemiologist would rely. This is where Rules 702 and 703 intersect. As the Court of Appeals stated: "If a study's method of data collection is faulty, it may be that no expert would rely upon the data generated as a basis for drawing any inference about the studied subject." DeLuca, 911 F.2d at 955 n. 14.
39. Because I conclude that Dr. Done has used data experts in the field would not use in rendering their opinions on the subject, Federal Rule of Evidence 703 requires its exclusion.
40. Summary judgment is proper where a party fails to establish the existence of an element essential to his case and where he bears the burden of proof. Celotex Corp. v. Catrett, 477 U.S. 317, 106 S.Ct. 2548, 91 L.Ed.2d 265 (1986). Because this Court has concluded that Dr. Done's testimony is inadmissible under Federal Rules of Evidence 702 and 703, I must conclude that plaintiffs have not met their burden under Celotex to produce evidence sufficient to raise a genuine issue of material fact as to whether Amy DeLuca's birth defects were caused by maternal ingestion of Bendectin during pregnancy. Id.; Anderson v. Liberty Lobby, Inc., 477 U.S. 242, 106 S.Ct. 2505, 91 L.Ed.2d 202 (1986); Matsushita Elec. Indus. Co., Ltd. v. Zenith Radio Corp., 475 U.S. 574, 106 S.Ct. 1348, 89 L.Ed.2d 538 (1986). Accordingly, summary judgment will be entered in favor of Merrell Dow.
VI. Sufficiency of the Evidence
41. Because I have concluded that Dr. Done's testimony is inadmissible under Federal Rules of Evidence 702 and 703, I do not reach the issue whether his testimony, if admitted, would meet the applicable burden of proof standard under New Jersey law.
For the foregoing reasons,
It is on this 29th day of April, 1992,
ORDERED that defendant Merrell Dow's motion for summary judgment be and is hereby GRANTED.