Opinion for the Court filed by Chief Judge WALD.
WALD, Chief Judge:
In this action, a class of women plaintiffs allege various forms of unlawful employment discrimination in the Foreign Service from 1976 to 1983. After a trial, the District
I. BACKGROUND INFORMATION
A. The Foreign Service and Its Employment Practices
The Foreign Service is our nation's professional diplomatic corps. Members of the Service represent the interests of this nation abroad and assist the Secretary of State in the formulation of foreign policy at home. See 22 U.S.C. § 3904(1)-(2). The organization of Foreign Service personnel draws on the model of the United States military as well as the United States civil service. See S.Rep. No. 913, 96th Cong., 2d Sess. 2 (1980), U.S.Code Cong. & Admin.News 1980, P. 4419. For example, the Foreign Service is a "rank-in-person" system: members of the Service have an individualized rank which is independent of the rank of the particular job they happen to hold at any given time. H.R.Rep. No. 992, pt. 1, 96th Cong., 2d Sess. 3 (1980).
The Foreign Service also copies the military in its "up or out" personnel system. Individuals must serve a probationary period of up to five years before they can receive a career appointment in the Service. 22 U.S.C. § 3946. If at the end of that period an individual has not received a career appointment, he or she must leave the Service. Id. § 3949. (Although according to the Foreign Service Act of 1980, the term "Foreign Service Officer" refers only to members of the Service with career appointments, and those serving under a limited, probationary appointment are called "career candidates," the parties to this lawsuit use the term "Foreign Service Officer," or "FSO," to refer to those serving under both career and limited appointments. To avoid confusion, we will do likewise.)
The Foreign Service assigns its officers to one of four areas of functional specialization, known as "cones": political, economic, administrative, and consular. Officers in the political and economic cones deal with, respectively, political and economic dimensions to foreign relations and foreign policy. Officers in the administrative cone "are responsible for the support operations of U.S. embassies and consulates." 616 F.Supp. at 1544 (¶ 5). Officers in the consular cone "work closely with the public providing assistance to American travelers and residents abroad, issuing visas [and dealing with] other immigration related issues." Id. (¶ 6). As the District Court expressly found, the State Department does not encourage FSOs to change cones, and "[o]fficers are expected to serve the major portion of their time in the Service" in the cones to which they were initially assigned. Id. (¶¶ 10, 14). Some officers, however, do switch cones. Senior FSOs who have demonstrated leadership ability may transfer into a "prestigious" program direction cone. Id. at 1554 (¶ 104). Other FSOs are occasionally given temporary assignments to other cones or to some "inter-functional" positions. Id. at 1550 (¶ 70).
Most FSOs applying to the Foreign Service at junior entry levels must take a written examination. Beginning in 1975, the examinations have tested applicants for aptitude in all four functional areas, and the Foreign Service has used the results of these examinations to determine a new FSO's initial cone assignment. Id. at 1545 (¶ 15.)
Once in the Foreign Service, individuals change specific jobs frequently; the State Department has a policy of assigning individuals to positions for a set period of time, generally two to three years. See id. at 1550 (¶ 71); H.Rep. No. 96-992, pt. 1, 96th Cong., 1st Sess. 3 (1980). Since 1975, job assignments in the Foreign Service have been made pursuant to an Open Assignment Policy, in which all members of the Service receive a list of vacant positions and submit "a bid list" indicating their preferences. These bid lists are compiled into a "bid book" from which assignment panels make their selections, after considering the interests and preferences of the bureau in which each position is located. Id. at 1550 (¶¶ 73, 74). As previously indicated, some FSOs receive "out-of-cone" assignments pursuant to this process but in the main, job transfers are made inside the cones of initial assignment. In addition, FSOs do not necessarily receive a job position with a rank corresponding to the individual's personal rank. Positions that have a higher rank than the individual are known as "stretch" assignments. Positions with a lower rank than the individual's are "down-stretch" assignments. Pursuant to the Open Assignment Policy, individuals do not receive stretch or downstretch assignments unless they bid for them, but as with any other assignment, individuals do not receive these assignments simply because they bid for them. Id. at 1551 (¶ 77).
The Foreign Service prepares annual written evaluations of its officers' job performance. In addition to rating the actual past performances of FSO's, the evaluations rate the potential of the FSOs future job performance. 616 F.Supp. at 1549. The State Department also gives out Honor Awards in recognition of outstanding achievement. In descending order of prestige are the Distinguished Honor Award, the Superior Honor Award, and the Meritorious Honor Award. See Plaintiffs' Post-Trial Brief at 112-13.
Except for Senior members, salaries in the Foreign Service are based on a schedule established by the President which consists of nine salary classes. 22 U.S.C. § 3963. The Secretary of State assigns all Foreign Service Officers to a particular salary class. Id. § 3964. By statute, except in limited circumstances, a career candidate for appointment as a Foreign Service Officer may not be initially assigned to a salary class higher than class 4 (class 1 being the highest). Id. § 3947. Usually career candidates are placed initially in class 7 or class 8. Promotions from one salary class to another are made by the Secretary of State after receiving recommendations and rankings submitted by selection boards which evaluate the members of each class. Foreign Service Officers do not compete for promotions until the transition from class 6 to class 5; until then, they are promoted at the end of an established time period if they perform their duties satisfactorily. See Joint Appendix ("J.A.") at 117-121; Defendant's Post-Trial Brief at 96.
B. The History of This Litigation
This class action began over ten years ago when appellants filed their complaint alleging that widespread discrimination against women in the Foreign Service violated Title VII of the Civil Rights Act of 1964, as amended in 1972 to cover employment discrimination in the federal government. See 42 U.S.C. § 2000e-16. The parties subsequently resolved by consent decree all claims relating to admission into the Foreign Service.
This appeal followed from the District Court's failure to find sex discrimination in seven different types of personnel practices.
With respect to each of these seven personnel practices, the appellants offered data showing a disparity between men and women, along with a statistical analysis designed to demonstrate the improbability that a disparity of that scale could result from chance. The data and analysis, they allege, provide a strong basis for inferring that this disparity was the product of unlawful discrimination. In addition, the appellants introduced nonstatistical evidence pertaining generally to the existence of a prejudicial attitude towards women in the Foreign Service from 1976 to 1983. The District Court, however, rejected the inference of unlawful discrimination in each of the seven areas.
In discounting the probative force of appellants' statistics, the District Court said that their statistical studies rested on faulty data, or flawed methodology, or omitted a crucial variable that would explain the disparity between men and women in a nondiscriminatory way. The District Court also said that some of the statistical evidence focused on too narrow a segment of Foreign Service personnel practices. As we shall explain, the District Court's treatment of the appellants' evidence was in some instances contrary to law and in other respects clearly erroneous as a matter of fact.
II. TITLE VII CLAIMS: TWO DIFFERENT THEORIES
Under Title VII a plaintiff can rely on either of two different theories to support a claim of unlawful sex discrimination. A "disparate treatment" claim alleges that the defendant intentionally based an employment decision on the sex of the plaintiffs. See, e.g., International Brotherhood of Teamsters v. United States, 431 U.S. 324, 335 & n. 15, 97 S.Ct. 1843, 1854 & n. 15, 52 L.Ed.2d 396 (1977). Disparate treatment claims can involve an isolated incident of discrimination against a single individual, or, as in this case, allegations of a "pattern or practice" of discrimination affecting an entire class of individuals. Id. A "disparate impact" claim alleges that the defendant based an employment decision on a criterion that although "facially neutral" nevertheless impermissibly disadvantaged individuals of one sex more than the other. Id. at 336 n. 15, 97 S.Ct. at 1854 n. 15. This case is a "classic" example of a disparate impact claim in which plaintiffs allege that the defendant based employment decisions on the results of a test for which members of one sex on average received lower scores than members of the other sex. See B. Schlei & P. Grossman, Employment Discrimination Law at 13 (1983-84 Supp.); see also Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971) (the original disparate impact case).
Because these two theories are distinct, we must consider them separately. Appellants' only disparate impact claim concerns the initial cone assignments; the other six claims involve disparate treatment and we will consider them first.
III. LEGAL PRINCIPLES APPLYING TO PATTERN OR PRACTICE DISPARATE TREATMENT CLAIMS
In a typical sex discrimination pattern or practice disparate treatment case, plaintiffs allege the existence of a disparity between men and women in selection rates for a particular job or job benefit and further allege that this disparity was caused by an unlawful bias against members of the disadvantaged sex, usually women. To prevail in their claim, plaintiffs must prove, by a preponderance of the evidence, that these allegations are true. Proof of the disparity itself is based upon a comparison of the proportion of those women eligible for selection who were actually selected with the corresponding proportion of eligible men who were actually selected. Plaintiffs establish a disparity disfavoring women if the evidence demonstrates that the selection rate for eligible women was less than the selection rate for eligible men. Sometimes, the disparity is expressed as the difference between the number of women actually selected and the number of women one would expect to have been selected, assuming equality in the selection rates for men and women. (If one knows the number of women eligible and the selection rate for men, one can determine, using algebra, the expected number of successful women.)
Proof that the observed disparity was caused by an unlawful bias against women need not be direct. Circumstantial evidence that the disparity, more likely than not, was a product of unlawful discrimination will suffice to prove a pattern or practice disparate treatment case. See Teamsters, 431 U.S. at 335 n. 15, 97 S.Ct. at 1854 n. 15. Indeed, this circumstantial evidence may itself be entirely statistical in nature. See, e.g., Segar v. Smith, 738 F.2d 1249, 1278-79 (D.C.Cir.1984), cert. denied sub. nom. Meese v. Segar, 471 U.S. 1115, 105 S.Ct. 2357, 86 L.Ed.2d 258 (1985). In this case, appellants rely to a great extent on statistical evidence to prove their claims of disparate treatment. We find it necessary, therefore, to discuss how statistical analysis of an observed disparity can raise an inference of unlawful discrimination.
A. Raising An Inference of Discrimination With Statistical Evidence
A disparity between the selection rates of men and women for a particular job or job benefit has one of three possible causes. See D. Baldus & J. Cole, Statistical Proof of Discrimination 291 (1980). First, the disparity may be a product of an unlawful discriminatory animus; this is
A statistical analysis of a disparity in selection rates can reveal the probability that the disparity is merely a random deviation from perfectly equal selection rates. Statistics, however, cannot entirely rule out the possibility that chance caused the disparity. Nor can statistics determine, if chance is an unlikely explanation, whether the more probable cause was intentional discrimination or a legitimate nondiscriminatory factor in the selection process. See id. at 290-92.
Title VII nevertheless provides that if the disparity between selection rates for men and women is sufficiently large so that the probability that the disparities resulted from chance is sufficiently small, then a court will infer from the numbers alone that, more likely than not, the disparity was a product of unlawful discrimination — unless the defendant can introduce evidence of a nondiscriminatory explanation for the disparity or can rebut the inference of discrimination in some other way. See Hazelwood School District v. United States, 433 U.S. 299, 307-08, 97 S.Ct. 2736, 2741, 53 L.Ed.2d 768 (1977) ("Where gross statistical disparities can be shown, they alone in a proper case constitute prima facie proof of a pattern or practice of discrimination."); see also Segar, 738 F.2d at 1278 ("[W]hen a plaintiff's methodology focuses on the appropriate labor pool and generates evidence of [a disparity] at a statistically significant level," this evidence alone will be "sufficient to support an inference of discrimination.").
This court, using different terminology, has stated that statistical evidence meeting "the .05 level of significance ... [is] certainly sufficient to support an inference of discrimination." Segar, 738 F.2d at 1283. "[T]he .05 level," the Segar opinion explained, "indicates that the odds are one in 20 that the result could have occurred by chance." Id. at 1282. (This statement is somewhat imprecise and has predictably led to confusion, as we discuss infra.) The Segar court justified the consistency of its statement with the statements of the Supreme Court by observing that "[a] level of two standard deviations corresponds to statistical significance at the .05 level." Id. at 1283 n. 28. In this case, the District Court cited Segar in its Conclusions of Law, stating: "The Court adopts the .05 level for establishing that a [statistical] study is statistically significant." 616 F.Supp. at 1559 (¶ 14). But the District Court then went on to say that "[t]he .05 level generally corresponds to 1.65 standard deviations." Id.
How can a 5% probability of randomness correspond both to a measurement of two standard deviations and a measurement of 1.65 standard deviations, one may reasonably ask? There is a legitimate answer: it depends on whether one is using a "one-tailed" or a "two-tailed" test of statistical significance. A disparity measuring 1.65 standard deviations corresponds to a 5% probability of randomness under a one-tailed test. A disparity measuring two standard deviations (to be more precise, 1.96 standard deviations) corresponds to a 5% probability of randomness under a two-tailed test.
This difference between one-tailed and two-tailed tests obviously requires further explanation. It also presages the obvious question, given the substantial differences in result, of which test is the more appropriate one to use in Title VII cases. Neither this court's opinion in Segar nor the District Court's opinion in this case discusses the difference between "one-tailed" or "two-tailed" approaches. The Supreme Court has given us no explicit guidance on this issue. And, unfortunately, neither side to this litigation has devoted more than a single footnote each to this difficult but important issue. See Appellants' Reply Brief at 32 n. 38; Appellee's Brief at 62 n. 73. For obvious reasons we, too, confront this issue with some trepidation. But appellants' and appellee's evidence on the underpromotion
Given the unavoidability of embarking upon a journey into the statistical maze, we begin with the terms "one-tailed" and "two-tailed";
But for every deviation from the mean of a normal distribution, measured in a certain number of standard deviations, there are two distinct ways of referring to the
We can speak of the probability measurement associated with 2.17 standard deviations in another way, however. Although the observed disparity between the actual and expected number of women in this example was an underselection of women, there is a corresponding possibility that women might randomly be overselected such that the difference between the expected number of women selected and the number of women selected due to this random overselection also measures 2.17 standard deviations. The probability of a random deviation from the expected number of women selected with a magnitude of 2.17 standard deviations or larger, resulting from either an underselection or overselection of women, corresponds to the area under the bell curve between 2.17 standard deviations and both extremes of the curves: 3%.
The difference between "one-tailed" and "two-tailed" tests of statistical significance stem from these two different ways of measuring probability. If one decides (as the Segar court did) to reject the hypothesis that an observed disparity from an expected result occurred randomly only if the observed disparity falls within the range of the 5% most extreme possible disparities, one must still decide whether the 5% range should be entirely within only one of the tails of the bell curve, or instead should be divided with half of the range in each tail. Five percent of the total bell curve can be found either in the range from 1.65 standard deviations from the mean to one extreme end of the bell curve or in the area from 1.96 standard deviations to both extreme ends of the bell curve. Compare Diagrams 2 and 3, copied from V. Cangelosi, P. Taylor & P. Rice, Basic Statistics 173-74 (1979). For this reason, a 5% probability of randomness corresponds to 1.65 or 1.96 standard deviations, depending upon whether one uses a one-tailed or a two-tailed test. (Similarly, 1.65 standard deviations correspond to a 10% probability of randomness under a two-tailed test; and 1.96 standard deviations correspond to a 2.5% probability of randomness under a one-tailed test.)
We are now, hopefully, in a position to address whether in a Title VII case, a court should use a one-tailed or two-tailed test to determine whether statistical evidence alone should raise an inference of unlawful discrimination, recognizing that there is a difference of opinion among courts and commentators on the issue. Compare, e.g., EEOC v. Federal Reserve Bank of Richmond, 698 F.2d 633 (4th Cir.1983), rev'd on other grounds sub. nom. Cooper v. Federal Reserve Bank of Richmond, 467 U.S. 867, 104 S.Ct. 2794, 81 L.Ed.2d 718 (1984), with Little v. Master-Bilt Products, Inc., 506 F.Supp. 319 (N.D.Miss.1980). Indeed, one leading treatise on the role of statistical evidence in Title VII litigation has shifted its position between the publication of the main text and the publication of a supplement. In the main text of their book, Baldus and Cole write:
D. Baldus & J. Cole, Statistical Proof of Discrimination 307-08 (1980) (footnote omitted). In the most recent supplement, however, the authors criticize as "unnecessarily strict" the Fourth Circuit's decision in EEOC v. Federal Reserve Bank of Richmond to require a two-tailed approach unless "independent evidence indicates the presence of discrimination of the type being challenged." D. Baldus & J. Cole, Statistical Proof of Discrimination 129 (1986 Cumulative Supp.) (footnote omitted). Baldus and Cole then state a preference for a legal rule that would allow a one-tailed test "if the possibility of intentional discrimination favoring the protected group represented by plaintiff [e.g., women in this case] can be ruled out as defying logic, i.e., the available evidence excluding the statistic in question gives strong support to the conclusion that the system is either nondiscriminatory or disadvantageous to the plaintiff's group." Id. at 129-30. In a footnote to this passage, the authors continue:
Id. at 130 n. 38.
Although the latest position adopted by Baldus and Cole makes some sense, we reject its applicability to the present case. We note that some of appellants' claims of unlawful discrimination involved complaints that women were overselected for particular kinds of jobs, e.g., consular cone and downstretch assignments. Appellants undoubtedly have the right under Title VII to object to the State Department's selection of FSOs for these positions on the basis of sex. Such claims of discriminatory overselection, however, require a two-tailed statistical analysis. Appellants may view consular assignments as inferior to political assignments, but another class of women plaintiffs could certainly bring a Title VII claim if women were intentionally underassigned to the consular cone. Consequently, statistically significant deviations in either direction from an equality in selection rates would constitute a prima facie case of unlawful discrimination. Indeed, appellants' own statistical expert testified that a two-tailed test was necessary in evaluating the disparity between men and women in assignments to the consular cone because the hypothesis to be tested is whether cone assignments are made without regard to sex. See Transcript (Tr.) at 1081.
We also think a two-tailed test of statistical significance should be applied to all of appellants' discrimination claims in this case. First, Baldus and Cole originally noted the importance of consistency in evaluating statistical evidence. Second, although we by no means intend entirely to foreclose the use of one-tailed tests, we think that generally two-tailed tests are more appropriate in Title VII cases. After all, the hypothesis to be tested in any disparate treatment claim should generally be that the selection process treated men and women equally, not that the selection process treated women at least as well as or better than men. Two-tailed tests are used where the hypothesis to be rejected is that certain proportions are equal and not that one proportion is equal to or greater than
Moreover, even if a disparity in only one direction is at issue in a particular Title VII case (e.g., only the underpromotion and not the overpromotion of women), we think that the more appropriate assessment of the probability that the contested disparity resulted from chance requires a recognition that a random disparity of equal magnitude, but in the opposite direction, is equally as likely. For example, if plaintiffs in a Title VII case come into court simply with evidence that women were underselected for a particular job, and that this disparity measured 1.75 standard deviations, it is perfectly true that the probability of women being underselected to this extent or more by chance is only 4%. Under a one-tailed test of statistical significance, employing the 5% level, as this court did in Segar, this evidence alone would establish a prima facie case of disparate treatment.
But for a disparity measuring 1.75 standard deviations it is equally true that the probability of a random deviation of this magnitude or larger, either underselecting or overselecting women, is 8%. In other words, disparities of this magnitude will be consistent with the hypothesis that the selection process did not treat men and women differently in 8% of the cases. Even if in the case before the court the disparity disfavors women and not men, how can the court ignore the possibility that the case might still be one of the 8% cases in which a fair selection process would by chance produce disparities in this magnitude or greater? Thus, we think a court should generally adopt a two-tailed approach to evaluating the probability that the contested disparity resulted by chance. Furthermore, although an 8% probability is pretty low, we do not think that it is low enough to establish by itself an inference of unlawful discriminatory animus. We think that statistical evidence must meet the 5% level referred to in Segar for it alone to establish a prima facie case under Title VII. Taken together, as we have said, a two-tailed test and a 5% probability of randomness require statistical evidence measuring 1.96 standard deviations. Consequently, if plaintiffs come into court relying only on evidence that the underselection of women for a particular job measured 1.75 standard deviations, it seems improper for a court to establish an inference of disparate treatment on the basis of this evidence alone.
Of course, plaintiffs in Title VII pattern and practice cases need not rely on statistical evidence alone. Because the ultimate issue in a disparate treatment case is whether the disparity resulted from unlawful discriminatory animus, plaintiffs may introduce any additional evidence which is probative on this issue. Thus, plaintiffs are in no way foreclosed from establishing an inference of discrimination simply because the contested disparity falls short of the 1.96 standard deviations mark when analyzed statistically. Obviously, to use an extreme example, if an employer admits under cross-examination that assignments for a certain position were based in large part on sex, it matters not that the observed underselection of women measures only 1.75 standard deviations. When plaintiffs in a Title VII pattern or practice case rely on evidence in addition to the evidence of the disparity itself, the issue for the trier of fact in determining whether the plaintiffs have established a prima facie case must be whether the totality of plaintiffs' evidence (again including the evidence of the disparity itself) demonstrates that,
B. The Applicability of Title VII to Any Personnel Action
A plaintiff may bring a Title VII claim for alleged discrimination with respect to any employment decision by an agency of the federal government. The statute itself states that "all personnel actions affecting employees or applicants for employment ... shall be made free from any discrimination based on ... sex." 42 U.S.C. § 2000e-16. In the Foreign Service Act of 1980, Congress reiterated this requirement specifically for Foreign Service employment practices. 22 U.S.C. § 3905.
From this statutory language, two legal principles necessarily follow. First, appellants in this case may bring a disparate treatment claim regarding discrimination in any type of personnel decision regardless of whether or not that discrimination has an effect on other, arguably more important, personnel decisions. Thus, if the State Department has intentionally discriminated against women in certain types of assignment decisions, the State Department has violated 42 U.S.C. § 2000e-16 even if the State Department can prove that the unlawful discrimination in assignments did not adversely affect the opportunities of women for promotion in the Foreign Service.
It is beyond dispute that the State Department may not discriminate against women in making any kind of employment decision, and if the State Department breaches this requirement, appellants have a cause of action to vindicate their statutory rights. We note, as further support of our interpretation of 42 U.S.C. § 2000e-16, that the Supreme Court last Term interpreted an analogous Title VII provision applying to private employers to encompass a claim of sex discrimination for sexual harassment even if the sexual harassment caused no tangible or economic loss. Meritor Savings Bank, FSB v. Vinson, ___ U.S. ___, 106 S.Ct. 2399, 91 L.Ed.2d 49 (1986). The provision of Title VII involved in Vinson makes it "an unlawful employment practice for an employer ... to discriminate
Second, and relatedly, if plaintiffs in a Title VII case claim discrimination in certain kinds of employment decisions, it is no defense that the government did not discriminate against women in other kinds of employment decisions. For example, if the State Department intentionally underselected women for appointment as Deputy Chiefs of Mission (DCM), the State Department has violated 42 U.S.C. § 2000e-16 even if the State Department can prove that it did not discriminate against women in assignments to five other "high visibility" positions. Appellants need not allege or prove discrimination in assignments to other "high visibility" positions in order to maintain a cause of action with respect to discrimination in DCM assignments. As the Supreme Court has stated: "Of course, Title VII provides for equal opportunity to compete for any job." Teamsters, 431 U.S. at 338 n. 18, 97 S.Ct. at 1856 n. 18 (emphasis in original).
Although under 42 U.S.C. § 2000e-16 appellants must not be required to prove discrimination in employment decisions other than the ones they are specifically contesting, the government is correct in arguing that evidence of nondiscrimination in those other employment decisions may be probative of whether intentional discrimination actually occurred in the contested employment decisions. For example, if an employer can demonstrate that it did not discriminate against women at several steps of a promotional ladder, that evidence, in some circumstances, may reasonably suggest that the employer did not discriminate in the step at issue either.
But courts must be especially careful in judging the relevance of this kind of evidence lest they contravene the legal rule that under 42 U.S.C. § 2000e-16 plaintiffs need not prove discrimination in personnel actions other than those specifically at issue. The evidence supporting an inference of unlawful discrimination in certain employment decisions may be sufficiently strong that evidence of nondiscrimination in other employment decisions cannot rebut this inference. Thus, in some cases the strength of appellants' prima facie case is so great that even if they were to agree to a stipulation that sex discrimination did not occur in other employment decisions, their evidence as to the employment decisions specifically at issue would still prove that, more likely than not, unlawful discrimination occurred.
When all the evidence raising and rebutting the inference of discrimination is statistical, according the proper deference to each legal principle is a delicate task indeed. If Title VII plaintiffs are able to muster only the most marginal inference of discrimination in only one type of job decision (e.g., the underselection of women in one promotional class measures only 1.98 standard deviations), then an inference of discrimination may be undercut by the fact that women are demonstrably not underselected in other similar job decisions. But even here courts must be wary. Evidence that the underselection of women in another similar job decision measures just below the 1.96 threshold, while not sufficient to prove discrimination, is not compelling evidence that the employer did not discriminate in this other employment decision.
Thus, when plaintiffs in a Title VII case introduce statistical evidence of an extreme disparity in the selection rates for men and women for a certain type of job, the fact that these plaintiffs have insufficient evidence to establish an inference of discrimination regarding other employment decisions should not block an inference of discrimination on the specific type of employment decision at issue. For example, if Title VII plaintiffs present evidence that the underselection of women for a particular type of job assignment measures above 3.0 standard deviations, this evidence necessarily raises an inference of discrimination
C. Rebutting the Inference of Disparate Treatment
As we have discussed, under Title VII courts will initially infer that a disparity between men and women in selection rates for a particular job or job assignment results from unlawful discrimination if the disparity is large enough: i.e., measures at least 1.96 standard deviations. But defendants in Title VII cases must be offered an opportunity to rebut this inference by showing that the disparity, albeit nonrandom in cause, resulted from some legitimate, nondiscriminatory factor. Similarly, defendants must be allowed to rebut the inference of discrimination by, alternatively, challenging the statistical calculations upon which the inference of discrimination is based. For example, the statistics may rely on faulty data, flawed computations, or improper methodologies. A recent Supreme Court opinion provides courts with some guidance on how to treat attempts to attack an inference of discrimination based on statistical evidence alone. See Bazemore v. Friday, ___ U.S. ___, 106 S.Ct. 3000, 92 L.Ed.2d 315 (1986).
In Bazemore, the United States District Court for the Eastern District of North Carolina was presented with statistical evidence that black employees of the North Carolina Agricultural Extension Service received substantially lower salaries than white employees working in the same job positions. The District Court determined that "the statistical evidence of plaintiffs standing alone and without further explanation probably suffices to make out a prima facie showing of discrimination in salaries." Civil Action No. 2879, Mem. Op. at 47 (August 22, 1982). The defendants in Bazemore, however, argued that plaintiffs' statistics failed to account for several factors, any of which would provide a legitimate, nondiscriminatory explanation for the salary disparities. Id. at 48. The District Court agreed with the defendants, holding that because defendants had demonstrated that these other factors might have caused the salary disparities, defendants successfully rebutted plaintiffs' inference of disparate treatment:
Id. at 54-55 (citation and footnotes omitted).
The Supreme Court reversed. In a unanimous opinion for the Court, Justice Brennan responded to the Fourth Circuit's "plainly incorrect" approach to statistical evidence:
106 S.Ct. at 3009.
Elsewhere in the opinion, Justice Brennan makes plain that the determination by the District Court whether discrimination exists or not "is subject to the clearly erroneous standard of appellate review." Id. at 3008. While the Supreme Court remanded the case to the Fourth Circuit to definitely determine whether "based on the entire evidence in the record," the District Court's decision had been clearly erroneous, the Justices did declare, "we think that consideration of the evidence makes a strong case for finding the District Court clearly erroneous." Id. at 3010-11 (footnote omitted). Rather than viewing the inclusion of "pre-Act" salaries in the statistical study as rendering the study fatally flawed, the Supreme Court stated that "evidence of pre-Act discrimination is quite probative." 106 S.Ct. at 3010 n. 13. Similarly, the Supreme Court rejected the assumption made by both the District Court and the Fourth Circuit that county-to-county variations in certain pay increases undermined plaintiffs' statistical conclusions: "Absent a disproportionate concentration of blacks in such counties, it is difficult, if
Thus, Bazemore instructs lower courts to be cautious about dismissing plaintiffs' statistical studies as not probative simply because defendant offers some nondiscriminatory explanation for the disparities shown. Implicit in the Bazemore holding is the principle that a mere conjecture or assertion on the defendant's part that some missing factor would explain the existing disparities between men and women generally cannot defeat the inference of discrimination created by plaintiffs' statistics. To be sure, as the Supreme Court acknowledged in Bazemore, there may be a few instances in which the relevance of a factor to the selection process is so obvious that the defendants, by merely pointing out its omission, can defeat the inference of discrimination created by the plaintiffs' statistics. See 106 S.Ct. at 3009 n. 10. The logic of Bazemore, however, dictates that in most cases a defendant cannot rebut statistical evidence by mere conjectures or assertions, without introducing evidence to support the contention that the missing factor can explain the disparities as a product of a legitimate, nondiscriminatory selection criterion.
This court, even before Bazemore, had explicitly endorsed the same principle, most recently in a situation where the government attempted to rebut the inference of discrimination arising from evidence that blacks in the Drug Enforcement Agency were paid less and promoted less rapidly than whites. The government argued that blacks were less likely than whites to have an extra year of "specialized experience" over and above minimal qualifications. We rejected the argument because the DEA failed to introduce any evidence to substantiate its assertion:
Segar, 738 F.2d at 1277.
IV. A REVIEW OF THE DISPARATE TREATMENT CLAIMS IN THIS CASE
Having discussed the applicable legal principles, we now address the specific disparate treatment claims at issue in this case. Supreme Court precedent has made plain the appropriate standard for reviewing a district court's determination that employment decisions were not the product of an unlawful discriminatory animus. We can reverse this factual finding only if it is clearly erroneous in light of all the evidence in the record or if it rests on legal error. See Bazemore v. Friday, ___ U.S. ___, 106 S.Ct. 3000, 92 L.Ed.2d 315 (1986); Anderson v. City of Bessemer City, 470 U.S. 564, 105 S.Ct. 1504, 84 L.Ed.2d 518 (1985); Pullman-Standard v. Swint, 456 U.S. 273,
A. Promotions and Evaluations
The Secretary of State argues that appellants' claim of "class-wide promotion discrimination lie[s] at the heart of this case." Appellee's Brief at 58. We agree.
Appellants claim that the State Department discriminated against women in promoting FSOs from class 5 to class 4 from 1976 to 1983. According to the government's own evidence, fewer women than expected were actually promoted to class 4 during that time period, given the number of promotion-eligible women in class 5. The government's own statistical analysis, whose methodology the District Court found to be more accurate than appellants', concluded that the discrepancy between the actual and expected number of women promoted measured 1.76 standard deviations. See 616 F.Supp. at 1557; Defendant's Exhibit 8A at 14 (Table 1, Model 2). As the District Court noted, this measurement means that the probability of an underpromotion of women this large or larger (a one-tailed inquiry) occurring randomly measures slightly less than 4%. 616 F.Supp. at 1557. As we have discussed, under a one-tailed test this number meets the 5% level set forth in Segar. But the corresponding probability of a random deviation from the expected number of women, either favoring or disfavoring women (a two-tailed inquiry), with a magnitude this large or larger is slightly less than 8%. See Defendant's Exhibit 8A at 14 (Table 1, Model 2). Thus under a two-tailed test, this number fails to meet the 5% level.
For the reasons set forth in Part III. A., we do not think this evidence alone is sufficient to prove an intent to discriminate against women. Appellants at trial, however, relied on additional evidence to prove a discriminatory motive. Appellants first point to evidence in the record of a general prejudicial attitude against women within the Foreign Service during this time period and argue that this evidence supports the proposition that the discrepancy between the actual and expected number of women promoted to class 4 results from a prejudicial attitude against women that violates Title VII.
This evidence includes statements made upon cross-examination by the defense witness, Benjamin Reid, who was Undersecretary of State for Management from 1977-1981. Reid testified that the Foreign Service, as a result of traditionally being "white, male, and Ivy League," had "set ways of doing things" and that although during his tenure the Foreign Service "had come a long way," it nevertheless "still had a long way to go" at the time he left in correcting these biased attitudes. Tr. at 3279-80. Similarly, the appellants introduced into evidence a report written in 1977 by a committee within the State Department asserting that "both attitudinal resistance to equal employment opportunity and discriminatory behavior are still widespread in the Department." Plaintiffs' Exhibit 29 at 6. The appellants also introduced into evidence a report published in 1984 by the Women's Research and Education Institute of the Congressional Caucus for Women's Issues, which stated that "`what some identify as traditional elitist attitudes have [worked] to limit severely employment opportunities for women and minorities [in the Foreign Service].'" Plaintiffs' Exhibit 88 at 10 (quoting a 1981 report prepared by the U.S. Commission on Civil Rights).
More specifically, as proof that the underpromotion of women FSOs from class 5 to class 4 resulted from a prejudicial attitude against women, the appellants relied upon evidence that the State Department believed that women FSOs had less potential for advancement than men FSOs even though men and women FSOs performed their duties with the same skill. A random sample of the evaluation reports for over 400 FSOs in classes 5 and 6 revealed that although "there was no significant difference in the performance ratings of men and women, ... the disparity between men and women [in their potential ratings] measured 2.49 standard deviations." 616 F.Supp. at 1549 (¶ 62) (emphasis added). As the District Court noted, this measurement means the likelihood of women being
The relevance of this evidence to whether the underpromotion of women from class 5 to class 4 resulted from a discriminatory attitude against women is obvious. As the State Department itself asserted and the District Court expressly found, competitive promotion decisions in the Foreign Service were based primarily on an "assessment of the officer's potential to perform at the next higher level." 616 F.Supp. at 1555 (¶ 114); Defendant's Post-Trial Brief at 92. If a biased attitude towards women was causing the State Department to underrate the potential of class 5 women FSOs in their evaluation reports, even though these women were on average performing equally as well as their male counterparts, one might well expect that this same biased attitude would be at work in the promotion decision itself.
The District Court, however, never considered the evidence of a discriminatory attitude about the potential of women derived from the evaluations in deciding whether appellants had proved, by a preponderance of all the evidence, discriminatory intent in the decisions pertaining to promotions from class 5 to class 4. Rather, the District Court offered the following grounds for rejecting the evidence relating to the evaluation reports:
616 F.Supp. at 1560 (¶ 25).
In our view this reasoning puts the cart before the horse. The District Court cannot determine that the State Department did not discriminate against women in promotions from class 5 to class 4 until it considers whether or not all the evidence demonstrates a biased attitude towards women and their capabilities. It cannot reject relevant evidence of discriminatory intent on the basis of a conclusion that no discrimination occurred without reference to the relevant evidence. To rule otherwise would convert Title VII into a Catch-22: in order to establish a promotional disparate treatment claim, a plaintiff must prove discriminatory intent; but she cannot offer proof of discriminatory intent in the form of disparate ratings between men and women as to their potential unless she has already established a promotional disparate treatment claim. We hold that appellants were entitled, as a matter of law, to have the District Court consider evidence in the ratings of a discriminatory attitude about the potential of women when evaluating appellants' disparate treatment claim concerning promotions from class 5 to class 4. Conversely, it was an error of law for the District Court to "reason" backwards and dismiss appellants' claim that the disparity in potential ratings was a violation of Title VII on the grounds that the court had already determined that the State Department did not discriminate against women in promoting FSOs from class 5 to class 4.
Thus, we reverse both the District Court's decision that the State Department did not discriminate against women in evaluating the potential of FSOs and its decision that there was no discrimination shown in promoting FSOs from class 5 to class 4. Following the command of Pullman-Standard v. Swint, 456 U.S. 273, 291-92, 102 S.Ct. 1781, 1791-92, 72 L.Ed.2d 66 (1982), we remand the case for further factfinding where the record permits more than one resolution of a factual issue. With respect to the question of whether the
Upon remand the District Court must consider whether, on the basis of the existing record, the evidence pertaining to the disparity in potential ratings, together with the nonstatistical evidence of a generally hostile attitude against women in the Foreign Service and the statistical evidence of the disparity in class 5 to class 4 promotions, is sufficient proof that, more likely than not, the underpromotion of women from class 5 to class 4 was based on discrimination. The evidence in the record cutting the other way is the failure of the appellants' statistical evidence to make out even a prima facie case that the State Department discriminated against women at other grades of the promotional process. Of course, as we have pointed out, appellants need not prove discrimination in these other promotion decisions in order to prevail in their disparate treatment claim concerning promotions from class 5 to class 4. Indeed, it is quite plausible that a discriminatory attitude about women and their potential for further advancement might affect promotions only at a mid-level step — like the transition from class 5 to class 4. First of all, as we discussed in Part I. A, supra, the promotions in the junior ranks (classes 7 and 8) were noncompetitive. Second, the Secretary's own statistical analysis showed that fewer women than one would expect were actually promoted from class 6 to class 5, although his study indicated that this disparity was just as likely to be a random deviation in a nondiscriminatory system as a symptom of discrimination. See Defendant's Exhibit 8A, Table 1, Model 2. Finally, one might surmise that those women who survive a discriminatory bias in critical mid-level promotion decisions have demonstrated such superior skill and aptitude that they would encounter less resistance to advancement in upper level positions. Despite all these considerations, the District Court is entitled to determine for itself on remand whether the government's evidence of nondiscrimination at other promotional levels is sufficient to outweigh the appellants' evidence, which as we have said includes three distinct elements: the disparity itself measuring 1.76 standard deviations, testimony and documented evidence of a general bias against women in the State Department, and the specific evidence as to discriminatory attitudes about the potential of women FSOs for future advancement, revealed in the evaluation reports of class 5 and 6 FSOs.
With respect to the evaluation reports, we note that the District Court committed a further error of law. In discussing the appellants' statistical analysis of the potential ratings for men and women, the court stated that:
616 F.Supp. at 1549 (¶ 65).
There was, in fact, no evidence whatsoever introduced at trial on which the District Court could rely to base its assumption that despite equivalence in actual performance officers with less experience would be viewed as having lower potential than those with more experience. See Appellants' Brief at 42. Moreover, the District Court's assumption is counterintuitive: if officers with less experience managed to perform at the same level as officers with more experience, one would expect that the less experienced officers would be seen as quick learners with more, not less, potential. In any event, the District Court was not entitled to rely on mere conjecture to undercut the probative force of appellants' statistics. See, supra, Part III.C. On remand, in deciding whether appellants' evidence concerning the evaluation reports demonstrated a bias against women, the
We note further that, even if the rating evidence proves insufficient to prove a discriminatory motive in promotions, appellants are entitled, as a matter of law, to bring an independent claim of disparate treatment with respect to the evaluation reports themselves. As we have seen, the Foreign Service Act of 1980 specifically includes any "evaluation" as a "personnel action" that must be free from discrimination. In light of this express statutory language, we cannot but read the words "all personnel actions" in 42 U.S.C. § 2000e-16 as encompassing such a claim. Thus, under Title VII, the State Department may not discriminate against women in their evaluations regardless of any demonstrated effect the evaluations ultimately can be shown to have on promotion opportunities. We need not now consider what remedy might be appropriate for discriminatory evaluations; the parties bifurcated the issues of liability and remedies.
To recapitulate, insofar as the District Court required appellants to prove discrimination in promotions in order to prove discrimination in evaluation reports, the District Court erred as a matter of law in two significant respects. First, the District Court unreasonably rejected a major portion of appellants' evidence that the promotion decisions at issue were infected with a discriminatory motive. Second, the District Court deprived appellants of their right under Title VII to bring a disparate treatment claim as to evaluations, regardless of how those evaluations might affect other employment decisions. Consequently, we remand to the District Court both the issue of whether the State Department discriminated against women in its decisions concerning promotions from class 5 to class 4 and the issue of whether it discriminated in its evaluations of the future "potential" of women FSOs.
Appellants brought disparate treatment claims with respect to various types of Foreign Service assignment decisions. We consider first appellants' claim that the State Department discriminated against women in "out-of-cone" assignments by overassigning women to positions in the consular cone and by underassigning women to the "prestigious" program direction cone. 616 F.Supp. at 1553-54.
1. Out-of-cone assignments
The District Court found that appellants' evidence disclosed the following facts about out-of-cone assignments to the consular cone:
616 F.Supp. at 1553-54 (¶ 101). Appellants contended that these extreme disparities resulted from the prevalent belief in the Foreign Service that women were especially suited for consular work. The government, in contrast, argued that the disparities resulted from the fact that women on the whole preferred consular assignments, and the Foreign Service merely honored these preferences. The District Court accepted the government's explanation of the disparities:
Id. at 1554 (¶ 101). On this basis, the District Court found appellants' statistical evidence "unconvincing" and concluded that appellants had failed to prove sex discrimination in out-of-cone assignments to the consular cone. Id. at 1560 (¶ 22).
It is true, as the District Court pointed out, that assignments are made in part pursuant to the bid lists submitted by members of the Foreign Service. But as the District Court acknowledged, bid lists were only one element of the assignment process, and the selection boards based their assignment decisions in larger measure on the perceived needs of the bureaus to which the assignments were made. See, supra, Part I.A. Moreover, the Secretary submitted no evidence showing that more women than men preferred out-of-cone assignments to the consular cone. Appellants' Brief at 55. The Secretary, on appeal, concedes as much.
The Secretary, however, would have us affirm the District Court's decision on the grounds that "an analysis which ignores `preference' ... is simply not probative on this issue." Appellee's Brief at 55. This argument, however, is precluded by the Supreme Court's Bazemore decision. According to Bazemore, appellants' statistical evidence concerning out-of-cone assignments to the consular cone is probative of discrimination despite the fact that it did not include individual preferences as a possible explanatory factor. There was no basis in the record on which the District Court could assume that women indicated preferences for consular work more frequently than men did. Consequently, the District Court contravened the dictates of Bazemore by refusing to credit the appellants' statistical evidence. Under Bazemore and Segar, the District Court is not entitled to dismiss plaintiffs' statistical evidence on mere conjecture.
With respect to out-of-cone assignments to the program direction cone, the District Court found that appellants' evidence showed that "38.5 percent of all out-of-cone assignments received by men in the political cone were to senior program direction cone positions, while only 14.6 percent of the out-of-cone assignments received by women in the political cone were to program direction cone positions." 616 F.Supp. at 1554 (¶ 105a). The District Court further found that this underselection of women measured 4.46 standard deviations, id., which means that either an underselection or an overselection of women of this magnitude or larger has a probability of occurring randomly in less than one in 100,000 times.
Appellants' evidence also demonstrated that "12.4 percent of the out-of-cone assignments received by men in the consular cone were to program direction positions, while only 6.6 percent of the out-of-cone assignments received by women in the consular cone were to program direction positions." 616 F.Supp. at 1554 (¶ 105b). This underselection of women measured 2.23 standard deviations, id., which means that the probability of women being randomly either underselected or overselected to this degree or greater is about 2.6%.
The appellants argued that this underassignment of women to program direction cone positions from the political and consular cones resulted from the discriminatory belief within the Foreign Service that women were unsuitable for prestigious leadership-track positions. It is unclear from the District Court's opinion why the District Court rejected this argument, and found, to the contrary, that the State Department did not discriminate against women in assignments from the political and consular cones to the program direction cone. The District Court did observe that "Defendant's expert produced an analysis indicating that, as to those men and women who did attain transfer to the Program Direction cone, there was no disparity in the amount of time spent in class before attaining the transfer." 616 F.Supp. at 1554 (¶ 106). Although the District Court found this evidence to "indicate that females are not discriminated against in their attainment of conversion to the Program Direction cone," it concluded, accurately, that this evidence could not be "dispositive" because "it measures the time in class and service of those who actually attain the Program Direction cone, and plaintiffs complain of a disparity in the number of men and women who are given out-of-cone assignments to positions which carry the program direction skill code and would thus qualify them for transfer to the Program Direction cone itself." Id. (¶ 107). The issue was not whether those women who were able to transfer to the program direction cone did so with the same speed as their male counterparts; rather, the issue was whether proportionally fewer women than men
Despite the District Court's concession that appellee's rebuttal evidence could not be "dispositive," it offered no other basis for rejecting appellants' claim of discrimination in out-of-cone assignments to the program direction cone positions. Specifically, it did not mention individual preference as a possible nondiscriminatory explanation for the disparity between men and women in their selection rates for these positions, probably because there was absolutely no evidence in the record indicating that women preferred assignment to the "prestigious" program direction cone less than men.
Thus, we conclude that the District Court failed to articulate any sufficient grounds for rejecting appellants' proof of discrimination in out-of-cone assignments to the program direction cone. The sole basis offered by the government was properly found by the court to be insufficient. It cited no other basis in the record for its decision, and we can find none. Therefore, we reverse and remand the issue for reconsideration, on the basis of the existing record. The inference of discrimination raised by the significant disparities between men and women given out-of-cone assignments to these "prestigious" positions is thus far unrebutted. Unless the District Court can find valid basis supported in the record for rejecting the inference of discrimination, it must rule in favor of the appellants on this claim.
2. Stretch and Downstretch Assignments
The appellants also claim that the State Department discriminated against women in "stretch" and "down-stretch" assignments. The evidence that appellants introduced at trial in support of this claim included the following statistics. First, between 1976 and 1981, "32.2% of the women in Class 4 were given downstretch assignments, while only 17.6% of the men in that class were given down-stretch assignments." 616 F.Supp. at 1552 (¶ 92). As the District Court noted, this disparity measures 6.72 standard deviations, id., and the chances of women being randomly overassigned or underassigned to this degree or greater is less than one in ten billion. See D.B. Owens, Handbook of Statistical Tables 13 (1962) (Plaintiffs' Exhibit 168).
Second, "20.8% of the women in Class 5 received down-stretch assignments, while only 14.2% of the men received them. This difference measures 4.04 standard deviations." 616 F.Supp. at 1552-53 (¶ 92). The probability of a random overselection or underselection of women of this magnitude or larger is about 1 in 20,000. See Plaintiffs' Exhibit 168 at 13.
Third, 19.9% of the women in class 7 received down-stretch assignments, whereas only 14.3% of the men in class 7 did. This disparity measured 2.39 standard deviations, which corresponds to a (two-tailed) probability value of about 1.6%. See Plaintiffs' Exhibit 57; Elementary Statistics, supra n. 8, at 479.
Fourth, with respect to stretch assignments, only 19.1% of women in class 4 received stretches, whereas 28.4% of the men in class 4 did. This underselection of women measured 3.74 standard deviations, which means that the probability of either an underselection or overselection of women of this magnitude or larger resulting from chance is about one in 5,000. See Plaintiffs' Exhibits 57, 168.
Fifth, only 31.6% of women in class 5 received stretch assignments, whereas 37.7% of the men in class 5 did. This disparity measured 2.79 standard deviations, which corresponds to a (two-tailed) probability value of 0.52%. See Plaintiffs' Exhibit 57; Elementary Statistics, supra, n. 8, at 479.
The appellants argued that this overassignment of women to downstretch positions and underassignment of women to stretch positions resulted from unlawful sexist attitudes in the Foreign Service. As additional evidence to support their contention, the appellants pointed to a 1977 report prepared within the State Department, which stated that stretch assignments "are not commonly given to those in EEO categories," meaning women and minorities.
First, the District Court stated that appellants had failed to show that the overassignment of women to downstretch positions and underassignment of women to stretch positions adversely affected the opportunities of these women for promotion. See 616 F.Supp. at 1553 (¶ 94). Once again, we repeat that appellants are entitled under 42 U.S.C. § 2000e-16 to bring a claim of sex discrimination with respect to "all personnel actions," including any category of assignments, regardless of how these assignments relate to other personnel actions, like promotion decisions. By relying on this determination, the District Court contravened the express provisions of Title VII.
Second, the District Court concluded that appellants' statistical evidence was "of little value in persuading that discrimination existed in assigning stretch and downstretches" because, in part:
616 F.Supp. at 1553 (¶¶ 96, 98).
While it is absolutely true that officers in any given class will be competing against officers from other classes, it is also absolutely irrelevant to the point of appellants' evidence. Appellants are trying to demonstrate, for example, that women in class 5 are less likely than men in class 5 to stretch into assignments labelled class 4 or higher, and that this disparity results from a widespread prejudice within the Foreign Service that women are less able than men despite their equivalent rank. Given this purpose, it is entirely irrelevant that officers from other classes may compete with men and women in class 5 for those assignments that are stretches for officers in class 5. Appellants are not interested in comparing how well the men and women in class 5 compete against officers in another class. They are only interested, and properly so, in how similarly situated men and women compete against each other.
It was an error of law for the District Court to reject the probative value of appellants' statistical evidence because of this irrelevant factor of "cross-class competition." Certainly, the Supreme Court's decision in Bazemore stands for the proposition that the "missing factor" identified by the District Court as a reason for discounting statistical proof of disparate treatment must at least be relevant to the point of the statistics. In Bazemore itself, the Supreme Court noted that "certain conclusions of the District Court are inexplicable in light of the record." 106 S.Ct. 3011 n. 15. For instance,
Id. In this case, the District Court's reliance on the omission of "cross-class competition" as a basis for rejecting appellants' evidence of discrimination in stretch and downstretch assignments is similarly "inexplicable."
Third, the District Court found appellants' statistics concerning stretch and downstretches to be "flawed" in another respect. The data from which the statistical analysis was made was tabulated in terms of the total number of years each FSO served in a stretch or a downstretch assignment rather than in terms of the number of such assignments. The District Court found that this methodology "does
Finally, the District Court found that "Plaintiffs' analysis did not allow for the preference of the individual FSO." 616 F.Supp. at 1553 (¶ 97). The District Court's reliance on this "preference" argument in the context of stretch and downstretch assignments differs significantly from its role in the context of out-of-cone assignments. To recall, the District Court had no evidence for believing that women more than men would prefer out-of-cone assignments to the consular cone and that this preference — rather than a discriminatory treatment of women — best explained the disparities in out-of-cone assignments. Here, in contrast, there is some evidence that women preferred downstretch assignments more than men did. As the District Court states, the record contains "testimony that down-stretch assignments are requested for various reasons, including the desire to gain an assignment with a spouse who is also a State Department employee." Id. If this testimony were indeed "extensive," as the District Court characterized it, we would conclude that the District Court's decision that the State Department did not discriminate in stretch and downstretches was not clearly erroneous. But we can find in the record only two instances in which a woman FSO subordinated her own career in favor of her husband's Foreign Service career — and in one of these instances, the witness testified that her decision in this instance was part of an alternating practice she and her husband agreed to of trading-off less desirable assignments. Compare Appellee's Brief at 53-54 n. 58 with Tr. 876, 1765, 2150. These two (or more accurately, one and a half) isolated instances do not amount to "extensive" testimony. Alone they do not establish a sufficient basis for undermining the probative weight of appellants' statistics. We must recall that some of the disparities between men and women in downstretch assignments were especially extreme, measuring 6.72 and 4.04 standard deviations. Given these kinds of numbers, it takes more than a few isolated examples of individual decisions by women to seek downstretches for the District Court not to conclude, that more likely than not, the disparities resulted from unlawful discrimination. Therefore, from our review of the totality of the evidence presented on the issue of discrimination in stretch and downstretch assignments, we must conclude that the District Court's finding of no discrimination was clearly erroneous. We reverse the District Court's decision on this issue of liability and remand for appropriate proceedings on the question of remedies.
3. Deputy Chief of Mission Assignments
Appellants also claim that the State Department discriminated against women in selecting Deputy Chiefs of Mission. The Deputy Chief of Mission (DCM) is the second in command, directly below the Ambassador, at each American embassy. As the District Court found, appellants introduced evidence showing that only "nine women were appointed DCM between 1972 and 1983, out of a total of 586 appointments." 616 F.Supp. at 1552 (¶ 88). The District Court then noted:
Id. The probability of a disparity this large or larger, either favoring or disfavoring women for the DCM position, resulting by chance in a selection process that did not differentiate between men and women, is about one in 2,500 times. Given this extremely low probability, this evidence, standing alone, raises a strong inference of disparate treatment.
The District Court offered several reasons for concluding that the State Department did not discriminate against women in DCM assignments. All of these reasons are erroneous as a matter of law. First, the District Court found this evidence "unconvincing" because appellants were unable to show "statistically significant disparit[ies]" in the selection rates for five other "high visibility positions." 616 F.Supp. at 1560 (¶ 20). (The other "high visibility" positions were: Deputy Assistant Secretary, Office Director, Country Director, Principal Officer, and Executive Director.)
Once more, we remind that under 42 U.S.C. § 2000e-16 appellants are not required to prove sex discrimination in assignments to six different types of jobs in order to establish discrimination in assignments to a single position. We have, however, also said that evidence of nondiscrimination in some jobs may be probative of whether discrimination occurred in selections for another kind of job. Adherence to both these legal rules may be difficult at times. But in this case it is clear that the District Court contravened the first of these two legal rules. Here, appellants introduced evidence showing that the underselection of women for DCM positions was so extreme that the chance of women being randomly underselected or overselected to this degree or greater was only one in 2,500 times. Not even a stipulation that the State Department did not discriminate against women in assignments to five other kinds of "high visibility" positions could defeat the inference of disparate treatment raised by this evidence. A defendant must produce other evidence directly relating to the job at issue to rebut this inference of discrimination. In this case, the District Court rejected appellants' strong inference of disparate treatment in part because appellants did not generate an inference of discrimination in five other types of assignments. This was legal error.
Second, the District Court stated:
616 F.Supp. at 1560 (¶ 19). It is not clear what the District Court meant by this statement. As we have seen, the District Court elsewhere acknowledged that appellants' statistical analysis was "based on the number of women in the grade levels from which DCM's were chosen." Id. at 1552 (¶ 88). Thus, according to the District Court itself, the appellants properly limited their study to the relevant applicant pool and therefore controlled for the fact not many women in the Foreign Service had reached a position in which they were eligible for appointment as Deputy Chief of Mission. What else, then, could the District
Third, the District Court found that "[plaintiffs'] statistical analysis is of little significance in that it encompasses the period 1972 through 1983, while the relevant time period for this case is 1976 to 1983." 616 F.Supp. at 1552 (¶ 89). This determination is directly contrary to the precise holding of the Bazemore decision. As discussed in Part III. B., the Supreme Court found that the inclusion of pre-Act data in a statistical study did not undercut the probative value of that study.
Thus, the three reasons the District Court gave for rejecting appellants' strong inference of disparate treatment in DCM assignments are inadequate as a matter of law. On appeal, the Secretary suggests an alternative nondiscriminatory explanation for the underselection of women to this position: more women might have been appointed Ambassador instead. Appellee's Brief at 57. We note that the District Court made no such finding and the only evidence in the record to which the Secretary directs us is a statement by a single witness that perhaps this fact might explain the underselection of women for DCM positions. Tr. at 1766. We think that the proper course under Pullman-Standard is to remand the issue to the District Court for further factfinding, on the basis of the existing record.
C. The Superior Honor Award
The appellants also claim that the State Department discriminated against women in granting the Superior Honor Award to Foreign Service Officers. As the District Court found, appellants presented the following evidence:
616 F.Supp. at 1548 (¶ 48). The chances are only one in 500 that a deviation of this magnitude or larger, either favoring or disfavoring
Once again, the reasons that the District Court gave for rejecting appellants' discrimination claim are contrary to law. First, the District Court stated that appellants failed to show how "the failure of women to receive the Superior Honor Award affected the opportunity for promotion." Id. (¶ 49). Appellants, however, are entitled to bring a sex discrimination claim under 42 U.S.C. § 2000e-16 with respect to personnel decisions involving awards regardless of how these decisions affect promotions. As we have seen, the Foreign Service Act of 1980 specifically includes "any ... award of performance pay or special differential" as among the personnel actions that must be free from sex discrimination, and we do not construe "all personnel actions" in 42 U.S.C. § 2000e-16 to have a lesser scope.
Second, the District Court rejected appellants' claim involving the Superior Honor Award as "unconvincing" because the appellants were unable to produce equivalent evidence with respect to other State Department Honor Awards. But as with the evidence concerning the DCM assignments, appellants' evidence concerning the Superior Honor Award is sufficiently strong to withstand even a stipulation that the State Department did not discriminate against women in granting other types of Honor Awards. To rebut the inference of discrimination here, the State Department was required to present evidence explaining the extreme disparity between the numbers of men and women receiving the Superior Honor Award.
Third, the District Court discredited appellants' evidence because the District Court thought that appellants' statistical "analysis was based on a faulty assumption that all female FSO's were equally qualified for the Superior Honor Award." 616 F.Supp. at 1548 (¶ 50). But appellants' evidence assumes nothing of the sort. The District Court apparently thought that appellants made this "faulty assumption" because, in the court's own words, appellants made "no showing ... of what portion of female FSO's were qualified for the Superior Honor Award." Id. But the statement reveals a fundamental misunderstanding of the role of relevant statistical evidence in a Title VII case. Appellants do not suggest that one FSO is as equally qualified to receive an award as another. These awards are obviously based on merit and are supposed to be given to only the outstanding FSOs. Appellants merely assume that the ranks of men and women FSOs would produce these outstanding individuals at (roughly) equal rates, and the State Department offered no reason for rejecting this assumption. Appellants' statistical analysis is based on the contention that if the State Department awarded this prize without bias against women, the percentage of eligible women receiving the award would be the same as the percentage of eligible men receiving the award (and thus the male/female ratio among award recipients would be the same as the male/female ratio in the pool of eligible candidates). Appellants properly limited their analysis to only FSOs in classes 1 through 5, because only FSOs in those classes received this award. Given that appellants limited their statistical analysis to the relevant pool, and the analysis revealed an underselection of women measuring 3.1 standard deviations, the inference of disparate treatment generated by this evidence is entitled to stand unless and until the government presents a credible nondiscriminatory explanation of why men in classes 1 through 5 more frequently received the Superior Honor Award than women in the same classes. See, supra, n. 6. By stating that appellants had established no basis for comparing actual awards with expected awards, and in believing that appellants assumed all female FSOs equally qualified for the award, the District Court revealed failure to understand the way in which statistics can prove discrimination in a Title VII case. Therefore, we reverse for legal error.
Moreover, because the State Department did not offer any explanation for the disparity between men and women in receiving
V. INITIAL CONE ASSIGNMENTS: THE CLAIM INVOLVING THE DISPARATE IMPACT THEORY
Appellants characterize their claim concerning initial cone assignments as both a disparate treatment and a disparate impact claim. This characterization, unfortunately, lacks a certain degree of clarity and may indicate some confusion on the appellants' part. Perhaps this confusion stems from the fact that the initial cone assignments involve two distinct groups of FSOs: those that took entrance exams and those that did not. See, supra, Part I & n. 2. It appears that appellants wish to bring a disparate treatment claim on behalf of both these groups and a disparate impact claim on behalf of the exam-takers. The appellants introduced statistical evidence of a disparity in initial cone assignments for which the pool was both the exam-takers and the nonexam-takers. Appellants' Brief at 22. This study was based on data supplied by the State Department. Id. The appellants also introduced statistical evidence of a disparity in the initial cone assignments for the exam-takers alone. Id. at 24. This study, by contrast, was based on data supplied by the Educational Testing Service (ETS) which administers the Foreign Service entrance exams and monitored the test results. Id. (The appellants apparently did not introduce any evidence regarding the nonexam-takers alone.) We do not believe, however, that in this case the appellants can pursue both a disparate treatment and a disparate impact claim with respect to the exam-taker's initial cone assignments. We will explain our reasons for this conclusion.
To apply the disparate treatment theory to the evidence concerning exam-takers, the appellants must allege and prove that the observed, nonrandom disparities were caused by intentional discrimination against women. To apply the disparate impact theory, the appellant must allege and prove that the disparities were caused by a "facially neutral" selection criterion that disadvantaged women more than men. Here, the appellants point to the political functional field portion of the Foreign Service Entrance Examinations. They have introduced evidence that from 1975 to 1980 men received higher scores than women on this test and that statistical analysis rejects the hypothesis that this disparity was a random sample of the deviation that would normally occur if men and women tested equally. See 616 F.Supp. at 1546 (¶ 27).
Of course, the appellants might have presented alternative claims: e.g., the disparity in initial cone assignments was caused either by discriminatory intent, or by the results of the entrance examinations. Nothing in Title VII or the Federal Rules of Civil Procedure prevents appellants from pursuing alternative claims or theories, even if they are mutually inconsistent.
Once over that initial hurdle, the resolution of appellants' disparate impact claim seems straightforward. The only basis which the District Court gave for rejecting appellants' statistical evidence that correlated test scores with initial cone assignments was that these statistics were "flawed and inconclusive." 616 F.Supp. at 1561 (¶ 28).
Id. at 1546 (¶ 29). Unfortunately, this finding of fact is itself flawed. Although the District Court is correct in saying that there was some confusion about the correct data for 1981 in some of appellants' statistics, this confusion did not involve the specific statistical studies relevant to the disparate impact claim involving the entrance examination: the data which were supplied by ETS. There was no dispute about the accuracy of this data. The confusion over the 1981 numbers arises from data supplied by the State Department's employment records. The State Department data were used in appellants' statistical studies involving both exam-takers and nonexam-takers and this evidence was unnecessary for the disparate impact claim involving exam-takers only.
Notably, the one obvious defense that the State Department never raised was that there was a legitimate "business" necessity for the test. Indeed, the District Court specifically found that "[d]efendant did not rely on a showing that the political functional field test was job related." 616 F.Supp. at 1546 (¶ 31). Thus, if the District Court concludes that the examination caused the disparity in initial cone assignments, the District Court must conclude that the test violated Title VII. See, e.g., Albemarle Paper Co. v. Moody, 422 U.S. 405, 95 S.Ct. 2362, 45 L.Ed.2d 280 (1975).
We have reviewed the District Court's decision in this case in detail and have concluded that it committed a number of legal errors and made several clearly erroneous errors of fact. Consequently, we reverse the judgment of the District Court and remand this action for further proceedings not inconsistent with this opinion. With respect to a number of the appellants' claims, we have held that the determination of liability under Title VII requires further factfinding by the District Court, to be conducted on the basis of the existing record. See C. Wright & A. Miller, Federal Practice and Procedure § 2577 (1971). We offer no views at this point on any issues relating to the remedies phase of this litigation.
It is so ordered.
We are not expert statisticians and we discuss statistics only insofar as necessary to give a comprehensible explanation of our view of the proper application of Title VII law to the facts of this case. Nor do we pretend to cover all of the issues that relate to the use of statistics in a Title VII case. For example, we note that there are various methods for deriving a "test statistic" measured in numbers of "standard deviations": the z-test, the t-test, etc. We have no opinion on the choice of these methodologies as this case does not call them into question. Similarly, we are aware that our discussion of statistics requires sufficiently "large" samples in order to be accurate; we have avoided the "small sample problem" because apparently none of the claims on appeal here involves small samples.
This approach follows Baldus and Cole in viewing disparities between 1.65 and 1.96 standard deviations as falling into an "intermediate" zone. See Baldus & Cole (Supp.) at 131-32. Numbers in this intermediate range go some of the way toward establishing a prima facie case of discrimination, but they cannot make the distance on their own. But cf., Meier, Sacks & Zabell, supra n. 9, at 12 (the appropriate intermediate zone falls between 1.96 and 2.33 standard deviations).
106 S.Ct. at 3002. As Justice Brennan's opinion reflects the reasoning of the unanimous Court, we have dispensed with the conventional practice of citing to it as a concurring opinion.
We note also that leading commentators support this corollary to the Bazemore rule. Baldus and Cole emphasize that "when otherwise relevant evidence is challenged on methodological grounds, the burden should normally be on the challenger (a) to present credible evidence that the statistical proof is defective and (b) to present a plausible explanation of how the asserted flaw is likely to bias the results against his or her position." D. Baldus & J. Cole, Statistical Proof of Discrimination at vii (1986 Supp.).
106 S.Ct. at 3010-11 n. 14. Similarly, here the State Department presented no evidence at all that preference would explain the disparities related to sex.