OPINION AND ORDER
SHIRA A. SCHEINDLIN, District Judge.
In stark contrast to the cuteness and humor of the cartoons at the heart of this trademark dispute, presently before the Court are dueling motions in limine to preclude dueling expert testimony. Each side argues that the other's expert survey concerning consumer confusion is so methodologically unsound as to render the survey and accompanying testimony inadmissible. No survey is perfect and the limits and flaws of a survey generally go to evidentiary weight and do not warrant exclusion. Exclusion may be justified, however, where a single error or the cumulative errors are so serious that the survey is unreliable or insufficiently probative. For the reasons stated below, I conclude that THOIP's survey is inadmissible and Disney's survey is admissible.
THOIP claims rights in an unregistered trademark consisting of "LITTLE MISS" with a character trait in big, bold, capital letters plus a character.1 For example:
According to THOIP, the mark was first used in the United States on a series of children's books—to which THOIP acquired rights in 2004.2 "From 1981 to the present, over 35 LITTLE MISS characters have been created and featured in at least 75 books, each book with a title character prominently featured on the cover, as well as various television series and videos."3 In addition to marketing the Little Miss—and related Mr. Men—books, THOIP has extensively licensed "the images, characters, stories and settings from the books ... for a wide range of uses worldwide[,]"4 including for an assortment of merchandise.5
In the summer of 2006,6 a THOIP licensee launched a line of T-shirts featuring the Little Miss "images and names in a format taken from the iconic cover of each book."7 The shirts are faux-distressed so as to look and feel vintage.8 They are sold at boutiques such as Kitson; national retail chains such as Bloomingdale's, Gap Kids, Hot Topic, Macy's, Nordstrom, Urban Outfitters, and Walmart; and online.9 Additionally, beginning in May 2007, THOIP's shirts were available at Vault 28—a boutique inside the Downtown Disney shopping complex, which lies outside of the Disneyland theme park in Anaheim, California.10 THOIP's shirts were also sold at Disney World in Orlando, Florida, specifically at Epcot Center's United Kingdom Pavilion and at the Virgin store in the Downtown Disney complex.11
In this suit, THOIP contends under section 43(a) of the Lantham Act and common law that its mark was infringed by two lines of T-shirts of the Walt Disney Company, Disney Consumer Products, Inc., and Disney Destinations, LLC (collectively "Disney").12 The first line—the so-called "Little Miss Disney" line—was launched in February 2008 and consisted of four different shirts: "Little Miss Bossy" with Daisy Duck, "Little Miss Perfect" with Minnie Mouse, "Little Miss Sassy" with Tinkerbell, and "Little Miss Wicked" with the Queen from the tale of Snow White.13
THOIP puts out a "Little Miss Bossy" shirt with its character, but has not used the specific adjectives perfect, sassy, or wicked.14
The Little Miss Disney shirts were sold at Disney theme parks, including Disneyland and Disney World, and at the World of Disney store in Manhattan.15
THOIP alleges that the Little Miss Disney shirts are infringing because both companies' shirts:
(1) use as their most prominent term the consistent formative LITTLE MISS element; (2) followed by a personality trait, usually self-deprecating; (3) rendered in identical sans-serif block letter typefaces, taken from the MR. MEN and LITTLE MISS book covers; (4) alongside a cartoon character visually portraying the relevant personality trait; (5) the shirts are rendered in faux-distressed style; [and] (6) the shirts are made from a fabric made to appear and feel well-worn and soft.16
The second line of allegedly infringing Disney shirts—the "Miss Disney" line— was launched in October 2007 and consisted of four different shirts: "Miss Chatterbox" with Minnie Mouse, "Miss Fabulous" with Minnie Mouse, "Miss Attitude" with Tinkerbell, and "Miss Adorable" with Marie the Cat.17
THOIP puts out a "Little Miss Chatterbox" shirt with its character, but has not used the terms fabulous, attitude, or adorable.18
The Miss Disney shirts were sold by many of the same national retail chains as the Little Miss THOIP shirts, as well as online.19 Two Miss Disney shirts—Miss Attitude and Miss Fabulous—were also available at Vault 28 within Downtown Disney in Anaheim.20
Though THOIP does not use "Miss" without the modifier "Little",21 THOIP alleges the Miss Disney shirts nonetheless are an unlawful infringement upon its mark.22
A. The Ford Survey
In support of its claims, THOIP proffers a survey from its retained expert Dr. Gary Ford that purports to examine whether consumers perceive two Little Miss Disney and two Miss Disney shirts to be emanating from, associated with, or permitted by THOIP.23
1. Design and Operation
Dr. Ford conducted a two-room "sequential array" survey in which respondents were shown, in room one, a THOIP shirt, and, in room two, an array of five shirts including an allegedly infringing Disney shirt (or control shirt) and four non-infringing "filler" shirts.24 Each respondent participated in one of eight different cells.25 Cells One, Three, Five, and Seven were "treatment" cells in which a THOIP shirt was compared to an array that included an accused Disney shirt.26 Dr. Ford testified that he paired a specific THOIP shirt with an allegedly infringing Disney shirt based on resemblance.27
Cells Two, Four, Six, and Eight were "control" cells—each corresponding to the immediately preceding treatment cell.28 In the control cells, a THOIP shirt was compared to an array of shirts in which the allegedly infringing Disney shirt from the corresponding treatment cell was replaced with a shirt bearing the same Disney character but omitting the words.29
More specifically to survey operation, a respondent was shown in room one a specific THOIP shirt and was told to look at it as if deciding whether to purchase it.30 When the respondent was finished looking at the shirt, the shirt was removed from view and the interviewer asked some unrelated questions intended to clear short-term memory of the THOIP shirt.31
The interviewer then escorted the respondent into a second room in which she was shown the five-shirt array.32 The interviewer instructed the respondent to look at the shirts as if considering whether to purchase any of them.33 The order of the array remained constant across respondents within a cell; the allegedly infringing Disney shirt (or control shirt) was always in the middle of the display.34 None of the shirts in the survey contained a neck label or any other indicia of origin.35
When the respondent finished looking over the array, the interviewer asked her a series of questions to determine whether she thought one or more of the shirts in the array emanated from, was associated with, or was permitted by THOIP.36 For instance, a respondent was asked: "Do you think one or more of these products is put out by the same company that put out the shirt I showed you earlier or that none of these products is put out by the same company that put out the shirt I showed you earlier or don't you know?"37
As an example, Cell One—which tested THOIP's Little Miss Bossy shirt with Disney's Little Miss Bossy shirt—was presented as follows:
As shown above, the array in room two consisted of Minnie Mouse, Lucy from Peanuts with the word "Lucy" above her image, Disney's allegedly infringing Little Miss Bossy with Daisy Duck, Dora the Explorer with the words "Dora the Explorer" above her image, and Hello Kitty.38
The array from Cell Two—the control cell corresponding to Cell One—was as follows:
Cell Three tested THOIP's Little Miss Chatterbox shirt with Disney's Miss Chatterbox shirt; Cell Four was the control version of Cell Three.39 Cell Five tested THOIP's Little Miss Splendid shirt with Disney's Miss Fabulous shirt; Cell Six was the control version of Cell Five.40 Cell Seven tested THOIP's Little Miss Splendid shirt with Disney's Little Miss Perfect shirt; Cell Eight was the control version of Cell Seven.41
2. Coding and Results
Dr. Ford classified a respondent as confused as to source, association, or permission if she identified one of the following reasons for selecting the allegedly infringing Disney shirt:
a) [b]ecause the shirt said Little Miss Bossy, Miss Chatterbox or Little Miss Splendid and/or Miss Fabulous and Little Miss Splendid and/or Little Miss Perfect or some variant of those terms; b) [b]ecause the shirt said "Little Miss" or "Miss;" c) [b]ecause the shirt said the same name, or same wording in both, same slogan or a similar phrase; or d) [b]ecause the shirt [sic] the same lettering, the lettering looks the same, or a similar response related to the lettering or writing on the shirt.42
Respondents were not classified as confused if they "gave a response such as it is a cartoon character,' `same fabric' or similar responses or gave responses that may have had multiple meanings, such as `same logo.'"43
Based on his coding, Dr. Ford found: (1) 27.5 percent of respondents perceived Disney's Little Miss Bossy shirt to be from a source that was the same as, associated with, or permitted by the company that put out THOIP's Little Miss Bossy shirt;44 (2) no respondents perceived the control shirt in Cell Two to be from a source that was the same as, associated with, or permitted by the company that put out THOIP's Little Miss Bossy shirt;45 (3) 25.3 percent of respondents perceived Disney's Miss Chatterbox shirt to be from a source that was the same as, associated with, or permitted by the company that put out THOIP's Little Miss Chatterbox shirt;46 (4) no respondents perceived the control shirt in Cell Four to be from a source that was the same as, associated with, or permitted by the company that put out THOIP's Little Miss Chatterbox shirt;47 (5) 14 percent of respondents perceived Disney's Miss Fabulous shirt to be from a source that was the same as, associated with, or permitted by the company that put out THOIP's Little Miss Splendid shirt;48 (6) no respondents perceived the control shirt in Cell Six to be from a source that was the same as, associated with, or permitted by the company that put out THOIP's Little Miss Splendid shirt;49 (7) 14.6 percent of respondents perceived Disney's Little Miss Perfect shirt to be from a source that was the same as, associated with, or permitted by the company that put out THOIP's Little Miss Splendid shirt;50 and (8) no respondents perceived the control shirt in Cell Eight to be from a source that was the same as, associated with, or permitted by the company that put out THOIP's Little Miss Splendid shirt.51
As discussed more fully below in Part IV.B, Disney argues that Dr. Ford's survey did not have an effective control, pointing out that none of the control respondents were classified as confused. Because Dr. Ford only counted as confused those respondents who made some reference to words or the style of the lettering on the shirt, and because none of the control shirts featured any lettering, "it was ... impossible under Dr. Ford's methodology for someone to have selected the control shirt in a way that Dr. Ford would have counted as confused."52
In response to this criticism, but without conceding that his original coding choices were incorrect, Dr. Ford "broadened [his] standards and recoded respondents as confused based on three additional criteria: e) [b]ecause of design; f) [b]ecause the shirts looked the same or similar; and g) [b]ecause it's the same type of pose or similar."53 Under Dr. Ford's more "liberal" standards, confusion increased in Cell One from 27.5 percent to 35.6 percent and in Cell Two from 0.0 percent to 8.6 percent with a net confusion (treatment cell minus control cell) of 27 percent for Disney's Little Miss Bossy shirt; confusion increased in Cell Three from 25.3 percent to 32 percent and in Cell Four from 0.0 percent to 7.2 percent with a net confusion of 24.8 percent for Disney's Miss Chatterbox shirt; confusion increased in Cell Five from 14 percent to 17.3 percent and in Cell Six from 0.0 percent to 5.9 percent with a net confusion of 11.4 percent for Disney's Miss Fabulous shirt; and confusion increased in Cell Seven from 14.6 to 17.9 percent and in Cell Eight from 0.0 percent to 6.6 percent with a net confusion of 11.3 percent for Disney's Little Miss Perfect shirt.54
Comparing Dr. Ford's original results to the recoded results, the net confusion for Disney's (1) Little Miss Bossy shirt decreased from 27.5 percent to 27 percent; (2) Miss Chatterbox shirt decreased from 25.3 percent to 24.8 percent; (3) Miss Fabulous shirt decreased from 14 percent to 11.4 percent; and (4) Little Miss Perfect shirt decreased from 14.6 percent to 11.3 percent.
3. Rebuttal to Ford Survey
Disney proffers a report and a declaration from its retained expert, Dr. Itamar Simonson, to rebut the Ford Survey, including Dr. Ford's recoding efforts.55 Dr. Simonson's report and declaration are addressed in Part IV.
B. The Helfgott Survey
Disney proffers a survey from Dr. Myron Helfgott that, like Dr. Ford, purports to study "whether or not [four of the accused shirts] are likely to cause consumers to think they are put out by, in association with, or with the permission of [THOIP]."56
1. Design and Operation
Dr. Helfgott conducted a so-called "Eveready" or "monadic" survey.57 Respondents were shown either an allegedly infringing Disney shirt or a control shirt, and then were queried about source, association, and permission.58 The four allegedly infringing shirts were the same as those tested by Dr. Ford: Little Miss Bossy with Daisy Duck, Miss Chatterbox with Minnie Mouse, Miss Fabulous with Minnie Mouse, and Little Miss Perfect with Minnie Mouse.59 The four control shirts featured the same Disney character as the corresponding accused shirt, but the words "Little Miss" or "Miss" were omitted and the character trait was replaced with a different (though definitionally similar) term—pushy, motormouth, marvelous, and flawless.60
Thus, unlike Dr. Ford's control shirts which contained no verbiage, Dr. Helfgott's control shirts included a single descriptive term.
Respondents were told to look over the allegedly infringing or control shirt as if considering whether to buy the shirt.61 The interviewer then asked the following questions: (1) "What company do you think puts out this T-shirt, or don't you know?"62 (2) "Do you think the company that puts out this T-shirt puts it out themselves, or in association with some other company, or don't you know?"63 And (3) "Do you think the company that puts out this T-shirt got permission from some other company, did not get permission from some other company, or don't you know?"64 If the respondent answered any of the above questions with a company name, a follow-up question was asked: "What in particular about this T-shirt makes you think [that]?"65 Where the respondent mentioned at least one company, she was asked about the products of each company mentioned.66
2. Coding and Results
Respondents were coded as confused if they identified any of the following in response to questions (1), (2), and (3) above: "THOIP", "Chorion", "Miss Books", "Little Miss Books", "Books", or "Children's Books".67 A respondent answering "Miss" or "Little Miss" was not coded as confused unless further inquiry indicated the respondent had THOIP or its products in mind.68
Based on his coding, Dr. Helfgott concluded that "the great majority of respondents correctly identify the exhibits shown as Disney products."69 Specifically, 71 percent of respondents identified Disney's Little Miss Bossy shirt as a Disney shirt; 85 percent of respondents identified Disney's Miss Chatterbox shirt as a Disney shirt; 75 percent of respondents identified Disney's Miss Fabulous shirt as a Disney shirt; and 85 percent of respondents identified Disney's Little Miss Perfect shirt as a Disney shirt.70 Across the four test cells, 79 percent of respondents were not confused.71 Across the four control cells, 82 percent of respondents were not confused.72 Thus, across the total sample, 80 percent of respondents were not confused.73 Only one respondent out of 1,200 identified the allegedly infringing Disney shirt as being associated with THOIP—an incidence rate of .0008 percent. Accordingly, Helfgott concluded that "there was virtually no evidence of confusion."74
3. Rebuttal to Helfgott Survey
THOIP proffers a report from its retained expert, Dr. Yoram (Jerry) Wind, to rebut the Helfgott Survey.75 Dr. Wind's report is addressed in Part IV.
III. APPLICABLE LAW
A. Trademark Infringement Under the Lanham Act
Claims for infringement of an unregistered trademark arise under section 43(a) of the Lanham Act ("the Act").76 Specifically, section 43(a) prohibits the use in commerce of:
any word, term, name, symbol, or device, or any combination thereof, or any false designation of origin, false or misleading description of fact, or false or misleading representation of fact, which... is likely to cause confusion, or to cause mistake, or to deceive as to the affiliation, connection, or association of such person with another person, or as to the origin, sponsorship, or approval of his or her goods, services, or commercial activities by another person.77
This section also acts as "a broad federal unfair competition provision."78
"A claim of trademark infringement, whether brought under [section 32(1)79 or section 43(a) of the Act], is analyzed under [a] familiar two-prong test ...."80 "The test looks first to whether the plaintiff's mark is entitled to protection, and second to whether defendant's use of the mark is likely to cause consumers confusion as to the origin or sponsorship of the defendant's goods."81 "The likelihood-of-confusion inquiry turns on whether `numerous ordinary prudent purchasers are likely to be misled or confused as to the source of the product in question because of the entrance in the marketplace of defendant's mark.'"82 "To support a finding of infringement, there must be a `probability of confusion, not a mere possibility.'"83 "The central consideration in assessing a mark's protectability, namely its degree of distinctiveness, is also a factor in determining likelihood of confusion."84
In determining whether there is a likelihood of confusion, courts within the Second Circuit apply the eight-factor balancing test introduced in Polaroid Corporation v. Polarad Electronics Corporation.85 The Polaroid factors are: (1) the strength of plaintiff's mark; (2) the similarity of plaintiff's and defendant's marks; (3) the proximity of the products; (4) the likelihood that plaintiff will "bridge the gap"; (5) actual confusion between products; (6) defendant's good or bad faith in adopting the mark; (7) the quality of defendant's product; and (8) the sophistication of the buyers.86 "The application of the Polaroid test is `not mechanical, but rather, focuses on the ultimate question of whether, looking at the products in their totality, consumers are likely to be confused.'"87 "No single factor is dispositive, nor is a court limited to consideration of only these factors."88 "Further, `each factor must be evaluated in the context of how it bears on the ultimate question of likelihood of confusion as to the source of the product.'"89
B. Admission of Expert Testimony
The proponent of expert evidence bears the initial burden of establishing admissibility by a "preponderance of proof."90 Rule 702 of the Federal Rules of Evidence states the following requirements for the admission of expert testimony:
If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise, if (1) the testimony is based upon sufficient facts or data, (2) the testimony is the product of reliable principles and methods, and (3) the witness has applied the principles and methods reliably to the facts of the case.
Under Rule 702 and Daubert v. Merrell Dow Pharmaceuticals, Inc., the district court must determine whether the proposed expert testimony "both rests on a reliable foundation and is relevant to the task at hand."91 The district court must act as "`a gatekeeper to exclude invalid and unreliable expert testimony.'"92 In doing so, the court's focus must be on the principles and methodologies underlying the expert's conclusions, rather than on the conclusions themselves.93 "[T]he Federal Rules of Evidence favor the admissibility of expert testimony, and [courts'] role as gatekeeper is not intended to serve as a replacement for the adversary system."94
In addition, Rule 403 of the Federal Rules of Evidence states that relevant evidence "may be excluded if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury." "Expert evidence can be both powerful and quite misleading because of the difficulty in evaluating it. Because of this risk, the judge in weighing possible prejudice against probative force under Rule 403 ... exercises more control over experts than over lay witnesses."95
C. Survey Evidence
Factor Five of the Polaroid inquiry concerns "actual confusion" between products and "[i]t is self-evident that the existence of actual consumer confusion indicates a likelihood of consumer confusion."96 However, it is well-established that a plaintiff seeking to prevail under the Lanham Act need not prove the existence of actual confusion, "since actual confusion is very difficult to prove and the Act requires only a likelihood of confusion as to source."97
Parties to trademark infringement actions frequently use consumer surveys to demonstrate or refute a likelihood of consumer confusion.98 Obviously, "[s]urveys do not measure the degree of actual confusion by real consumers making mistaken purchases. Rather surveys create an experimental environment from which we can get useful data from which to make informed inferences about the likelihood that actual confusion will take place."99
Reliance on expert studies is not unqualified and without hazards. Indeed, "any survey is of necessity an imperfect mirror of actual customer behavior under real life conditions .... It is notoriously easy for one survey expert to appear to tear apart the methodology of a survey taken by another."100 Practically speaking, there is "no such thing as a `perfect' survey. The nature of the beast is that it is a sample, albeit a scientifically constructed one."101
To assess the validity and reliability of a survey, a court should consider a number of criteria, including whether:
(1) the proper universe was examined and the representative sample was drawn from that universe; (2) the survey's methodology and execution were in accordance with generally accepted standards of objective procedure and statistics in the field of such surveys; (3) the questions were leading or suggestive; (4) the data gathered were accurately reported; and (5) persons conducting the survey were recognized experts.102
"[T]he closer the survey methods mirror the situation in which the ordinary person would encounter the trademark, the greater the evidentiary weight of the survey results."103 The failure of a survey to approximate actual marketplace conditions can provide grounds for inadmissibility.104
While errors in survey methodology usually go to weight of the evidence, a survey should be excluded under Rule 702 when it is invalid or unreliable, and/or under Rule 403 when it is likely to be insufficiently probative, unfairly prejudicial, misleading, confusing, or a waste of time.105 Where, as here, a trademark action contemplates a jury trial rather than a bench trial, the court should scrutinize survey evidence with particular care.106
Disney argues that the Ford Survey suffers from a number of flaws that individually and collectively require its exclusion under Rules 702 and 403. THOIP argues the same as to the Helfgott Survey.
A. Marketplace Conditions and Survey Format
Survey results are contingent on the method used, and different methods for assessing likelihood of confusion often produce drastically different results.107 Case in point: the Ford and Helfgott surveys. THOIP and Disney vigorously dispute the proper survey format for examining consumer confusion in the context of this case. Unsurprisingly, each side contends that its expert's survey design is the only appropriate format for the facts presented: THOIP urges the sequential array format used by Dr. Ford while Disney urges the single-exposure Eveready format used by Dr. Helfgott. In support of their respective positions, both sides deploy experts of formidable position and experience.108
In untangling this knot, tied tight by numerous opposing expert opinions, the principal question is whether either survey, if not both, sufficiently simulated the actual marketplace conditions in which consumers encountered the parties' products so as to be a reliable indicator of consumer confusion.109 In answering this fact-intensive question, the type of confusion alleged is of paramount importance.110 THOIP alleges "forward confusion" such that a consumer who encounters an allegedly infringing Disney shirt erroneously believes that shirt originates from or is affiliated in some manner with THOIP (the prior/senior user of the mark).111 THOIP has not articulated a "reverse confusion" theory that consumers perceive THOIP's shirts as having been produced by Disney (the subsequent/junior user of the mark).112 Dr. Simonson, Disney's rebuttal expert, frames the inquiry thus:
The likelihood of confusion (and the proper methodology for estimating the likelihood of confusion) depend largely on (a) the relevant consumers' awareness (or lack thereof) of THOIP's shirts (including the "Little Miss" phrase), and (b) the typical manner in which consumers encounter the THOIP shirts, and in particular, whether or not there is a high likelihood that they would encounter side-by-side the specific allegedly infringed THOIP shirt (e.g., "Little Miss Chatterbox") and the corresponding allegedly infringing Disney shirt (e.g., "Miss Chatterbox").113
THOIP tends to agree with Dr. Simonson's framing of the inquiry,114 except that, as discussed more fully below, THOIP contests the notion that a sequential array survey is only justified where the products are encountered side-by-side in the marketplace.
Before addressing each side's contentions about why the other's expert survey failed to sufficiently reflect marketplace conditions, it is important to describe those conditions and, in particular, the manner in which consumers encountered THOIP's and Disney's products in the marketplace, including whether they are competitors.
Obviously, the Little Miss THOIP, Little Miss Disney, and Miss Disney lines are within the same narrow category of goods directed at the same set of consumers.115 The products are undeniably alike: All are T-shirts bearing a cartoon and a similar phrase of a certain witticism or cheekiness, are fabricated in a common faux-distressed style, and are marketed to and for women and girls. While it is unclear whether the parties' shirts retailed for the exact same price, they were without a doubt within the same price-point.116
THOIP's shirts hit the market in the summer of 2006 and apparently continue to be offered for sale nationwide. The Little Miss Disney and Miss Disney lines were launched in February 2008 and October 2007, respectively, and were pulled from the shelves by August 2008. Therefore, THOIP's shirts overlapped with the Little Miss Disney line for approximately six months, and with the Miss Disney line for approximately ten months.
Because the two lines of accused Disney shirts were available in different locations, I analyze each line separately. The parties agree that the THOIP and Little Miss Disney shirts were not sold in the same stores but rather were sold in different stores within various distances of each other.117 In particular, the Little Miss Disney shirts were sold at the World of Disney Store in Manhattan, which is located at Fifth Avenue and Fifty-Fifth Street. A number of other stores within a few city blocks carried the Little Miss THOIP shirts, such as Gap Kids at Fifth Avenue and Fifty-Fourth Street, and Saks Fifth Avenue at Fifth Avenue and Fiftieth Street.118 Similarly, where the Little Miss Disney shirts were sold inside Disneyland and possibly at the World of Disney Store in the Downtown Disney complex outside of Disneyland, THOIP's shirts were sold at Vault 28 in this Downtown Disney complex. The Little Miss THOIP and Little Miss Disney shirts may also have been available in nearby stores at Disney World in Florida—specifically, while THOIP's shirts were sold at the Virgin Store, the World of Disney Store in the Downtown Disney complex may have sold Little Miss Disney shirts.119
As for the Miss Disney line, the parties agree that their shirts were sold by the same chains.120 The parties dispute whether their products were available simultaneously in specific stores,121 and, if they were, whether they were on sale near to each other within those stores. According to Disney's expert, Dr. Michel Pham, department stores, such as JC Penney, Kmart, Kohl's, and Target, "generally sold [Miss Disney] t-shirts in a specific section grouped with other Disney merchandise in order to take advantage of the Disney brand."122 THOIP counters with photographs showing THOIP and Disney shirts (though not the THOIP and Disney shirts at issue here) on display next to each other in a Walmart in Middle Island, New York, and other photographs depicting Disney shirts and apparently non-Disney shirts on display together in several chain stores in Manhattan.123 Disney responds that because the pictures do not show the actual T-shirts at issue in this suit, they prove nothing.124 THOIP's declarant, Amory Millard, responds that based on her years of experience: "[I]f Disney had not taken the Miss Disney Shirts off the market (or if Disney were to resume selling the Miss Disney Shirts) it is highly likely that the Miss Disney Shirts would be sold alongside other Disney T-shirts at Walmart, Target, and other stores ...."125
Additionally, the parties agree that the THOIP shirts and two infant Miss Disney shirts were available at the same time at Vault 28.126 Disney again asserts that its shirts were not sold next to THOIP's because the Miss Disney shirts were displayed in an armoire containing infant and children's clothes that was approximately thirty-six feet from where the THOIP shirts were displayed.127 THOIP responds that Vault 28 is a relatively small boutique of only 1,185 square feet.128
B. The Ford Survey
1. The Ford Survey Failed to Sufficiently Replicate Actual Marketplace Conditions
Disney first argues that the Ford Survey did not reflect marketplace conditions because "THOIP has failed to demonstrate beyond mere speculation that there was an appreciable likelihood of seriatim viewing of the THOIP and Disney T-shirts at issue in the real world."129 According to Dr. Simonson, "a sequential presentation of the two marks at issue (or array) is appropriate only if it reflects a significant number of real world situations in which both marks at issue are likely to be evaluated sequentially or side-by-side."130 For example, a sequential survey is proper, according to Dr. Simonson, to estimate confusion with respect to two skincare lines sold in many of the same outlets,131 Monster Milk and Monster Energy drinks,132 Bounty and Brawny paper towels,133 or Heinz and Hunts ketchup.134 In his deposition, Dr. Simonson testified that an Eveready study would probably not be appropriate in such a scenario.135 Dr. Simonson further testified that an Eveready study is the appropriate design— and a sequential survey is inappropriate— where the products are in the same category of goods but are sold in different stores within the same shopping center.136 Based on these views, Dr. Simonson opined that the Ford Survey format was not justified because "there is no evidence of any marketplace situation in which the Disney and THOIP shirts at issue appeared together."137
THOIP responds that Dr. Ford's "sequential array format is the only appropriate format for a case such as this, where both parties sell identical products, which are sold in the same stores and in the stores within close shopping proximity to one another are therefore reasonably likely to be encountered by the same consumers."138 Dr. Ford argued in his deposition why a sequential array format better replicates marketplace conditions in this case than an Eveready design:
[I]f people are out shopping and encounter, as they move from department to department within a store or move from store to store and encounter the brand, even on different days, the array study does a better job of replicating that experience than does the—an Eveready study. [The Eveready design does] not in any way reflect the shopping experience that people might have.139
Similarly, Dr. Wind opined:
[T]he Ford study which utilized the array design is the more appropriate design for the context of this case and any similar co[-]branded situation in which the parties have (I) different trademark assets—in this case Disney who own [sic] the cartoon characters, and "Little Miss" [sic] who own the unique combination of "Little Miss + personality/character traits + a cartoon character" presented as a variable family of designs[;] (ii) when one of the brands is much better known than the other and especially in an unaided awareness context[; and] (iii) both products are available in close shopping proximity in a number of real world shopping environments.140
THOIP and its experts define "close shopping proximity" to include the scenarios where the senior and junior products are available within the same stores, though not necessarily in the same section, as well as in different stores within a close distance.141
Based on all of the expert testimony and reports and my review of the academic literature and case law, I conclude that the Ford Survey did not sufficiently approximate the manner in which consumers encountered the parties' products in the marketplace. When ascertaining whether a survey methodology sufficiently simulates marketplace conditions, the focal point must be the specific products tested by the survey. THOIP has not shown a reasonable likelihood that consumers would have proximately encountered the specific pairs of shirts tested by Dr. Ford. Those were: THOIP's Little Miss Bossy with Disney's Little Miss Bossy; THOIP's Little Miss Splendid with Disney's Little Miss Perfect; THOIP's Little Miss Chatterbox with Disney's Miss Chatterbox; and THOIP's Little Miss Splendid with Disney's Miss Fabulous. Of crucial importance, Dr. Ford coupled THOIP and Disney shirts based on resemblance rather than on whether they were found together in the marketplace.142 While comparing the most similar THOIP and Disney shirts may be a useful heuristic device, a legally-probative estimation of consumer confusion must be tethered to marketplace conditions.
By contrast to THOIP's failure to establish that the specific pairs tested in the Ford Survey were to be found in close proximity in the marketplace, Disney has demonstrated that Dr. Ford's choices did not reflect the realities of the marketplace. As to the Miss Disney line specifically, the January 15, 2010 declaration of Melanie Bradley states:
A review of the Junk Food [the THOIP licensee] and Disney royalty reports indicates that the specific pairs of shirts that Dr. Ford tested were never sold in the same retail locations, with one exception. The Nordstrom's [sic] retail chain apparently carried a "Little Miss Chatterbox" THOIP T-shirt and also at some point carried a "Miss Chatterbox" Disney T-shirt. However, we have no way of knowing which Nordstrom's [sic] locations carried the T-shirts or if any Nordstrom's [sic] locations carried both T-shirts. We also do not know whether the T-shirts were carried by the Nordstrom's [sic] chain during the same time period.143 Bradley continues:
[W]hile ... other retailers such as Macy's and Urban Outfitters may have carried at one time or another one or more "Miss Disney" T-shirts and one or more THOIP T-shirts, again, the documents do not show which locations carried each T-shirt or the time period during which these T-shirts were sold. As such, there is nothing to indicate that the "Miss Disney" T-shirts and the THOIP T-shirts at issue here ever appeared in the same stores at the same time.144
THOIP has offered no proof to rebut Bradley's assertions. Thus, notwithstanding that many of the same retailers sold THOIP and Miss Disney shirts, and despite the evidence that Disney shirts are at times sold side-by-side non-Disney shirts, such general information does not justify Dr. Ford's specific choices.
In addition, with respect to the Little Miss Disney line, even assuming arguendo the soundness of THOIP's position that a sequential array survey is justified where the products are found in different stores within short distances of each other,145 THOIP has not shown a reasonable likelihood that consumers would have proximately encountered its shirts and the Little Miss Disney shirts in a critical number of real world situations. Though THOIP's shirts were available in thousands of stores nationwide,146 the Little Miss THOIP and Little Miss Disney shirts were on sale in a small number of nearby stores in at most three geographic locations (New York City, Anaheim, and Orlando).147 In these circumstances, there is not a reasonable likelihood that consumers would have encountered in close proximity Little Miss THOIP and Little Miss Disney shirts, let alone those pairs specifically tested by Dr. Ford.148
Another aspect of Dr. Ford's methodology caused his survey to further depart from actual marketplace conditions. The shirts used in the survey did not bear the neck labels and hang tags that would have been attached to the shirts in the marketplace.149 Dr. Ford asserts it is unlikely that including such indicia of origin would have had a substantial impact on his findings because "if respondents paid attention to the neck label and/or hang tags and realized that the source of the t-shirts was Disney, they still may have perceived that Disney produced the shirts in association with THOIP or got permission from THOIP."150 Additionally, Dr. Ford contends that "[neck tags] are visible but relatively unobtrusive ...."151
While it may be true that a respondent informed about the source of a Disney shirt might have still thought that shirt was associated with or permitted by THOIP, it is very likely that source information would have diminished confusion not only as to source but also as to association and permission. Labels are not unnoticed by consumers; rather, they serve as important sources of information, including brand identification.152 Dr. Ford's failure to use hang tags and neck labels clearly is a deviation from actual marketplace conditions.153
Finally, Disney argues that the Ford Survey failed to approximate marketplace conditions because it improperly created artificial awareness of THOIP's claimed mark among survey respondents,154 and, relatedly, "tested `conditional probability,' i.e., only the potential rate of confusion amongst the limited subset of people who were already aware of THOIP's claimed mark, not the actual likelihood of confusion among the far broader universe of potential consumers who are not."155 Of course, these alleged flaws are inherent in an otherwise-justified sequential survey testing forward confusion. Here, though public awareness of THOIP's name and that of its parent company is low,156 there is an appreciable awareness of the Little Miss—and related Mr. Men—characters.157 Therefore, that Dr. Ford showed respondents a THOIP shirt before displaying the array cannot be considered a flaw of any significance.158
Nonetheless, based on all of the foregoing discussion, I conclude that the Ford Survey failed to sufficiently replicate the manner in which consumers encountered the parties' products in the marketplace, which severely diminishes the reliability and probative force of this survey.
2. The Ford Survey Did Not Use an Adequate Control
Not only did the Ford Survey fail to approximate marketplace conditions, it also suffers from another major flaw—it did not have an effective control. A survey designed to estimate likelihood of confusion must include a proper control.159 A control is designed to estimate the degree of background "noise" or "error" in the survey. Without a proper control, there is no benchmark for determining whether a likelihood of confusion estimate is significant or merely reflects flaws in the survey methodology.160 To fulfill its function, a control should "share as many characteristics with the experimental stimulus as possible, with the key exception of the characteristic whose influence is being assessed."161 To obtain an estimate of the net likelihood of confusion, the researcher subtracts the measured confusion level in the control from the measured confusion level in the treatment version.
Dr. Ford's control was a shirt depicting the same Disney character image as an allegedly infringing shirt but without any words. Disney argues that these control shirts, because they contained no words, were too dissimilar to the test shirts such that they were not an effective control. Even assuming the control was adequate, Disney contends, "Dr. Ford's methodology ensured that in practice his `control' would not account for any `background' noise."162 As noted previously, under Dr. Ford's original confusion standards, no respondent selecting the control shirt was coded as confused.
In response, Dr. Ford expanded his confusion criteria and recoded the data. However, according to Disney, Dr. Ford's efforts to recode do not repair the problem with the control itself.163 I agree. Even after recoding, the control underestimates the "noise" in the survey as it only contained a cartoon character and lacked other key unprotectable elements of the test shirts, such as a descriptive term. Thus, respondents were less likely to pick the control shirt or give a response that Dr. Ford would code as confusion.
Because the Ford Survey failed to replicate actual marketplace conditions in which consumers encountered the products at issue here and failed to use an adequate control, it is not a reliable indicator of consumer confusion. Accordingly, the Ford Survey is inadmissible.
C. The Helfgott Survey
1. The Helfgott Survey Sufficiently Replicates Marketplace Conditions
Mirroring its arguments in support of the Ford Survey, THOIP argues that Dr. Helfgott's Eveready study failed to sufficiently replicate actual marketplace conditions because "[s]urvey participants never saw samples or even images of THOIP's shirts, despite the fact that both parties' shirts are sold in the same and/or proximate locations."164 In this regard, THOIP argues the Eveready format severely underestimates consumer confusion, especially where, as here, there is low public awareness of THOIP and Chorion.
As I have already explained, because of the low likelihood that consumers would have encountered, in close proximity, the specific pairs of shirts tested by Dr. Ford, his survey does not approximate the actual marketplace conditions surrounding the products at issue here. By contrast, Dr. Helfgott's Eveready study, in which respondents were exposed to a single shirt, does approximate those conditions.165
2. Other Challenges to the Helfgott Survey
THOIP claims a host of other problems with the Helfgott Survey. Most significantly, THOIP contends that Dr. Helfgott conducted an "unusual" variation of the Eveready study by only asking respondents the name of the company that put out or authorized the shirt. THOIP argues this was inappropriate because consumers do not know the name of THOIP or its parent as THOIP's "marketing strategy has been to focus on and promote the Mr. Men and Little Miss family of characters as a brand, and not to identify its licensed goods with a corporate name as the source of origin."166
Dr. Helfgott's use of the word "company" in his survey questions was not unusual.167 Indeed, Dr. Ford used the term "company" in his survey questions and testified that the use of the term was appropriate and would not mislead respondents.168 This is not a problem because when the confusion results were tabulated in the Helfgott Survey, the respondents did not need to know or state the exact legal name of the entity that puts out the product at issue to be counted as confused. Under Dr. Helfgott's confusion criteria, respondents who stated "Miss Books", "Little Miss Books", "Books", or "Children's Books" qualified as confused.169
THOIP asserts a number of other flaws in the Helfgott survey, including that the questions were biased and leading, the universe was incorrect, there was no reference to interviewer briefings or practice interviews, and that Dr. Helfgott relied on a subcontractor to review the verbatim responses.170 I have carefully reviewed these concerns and conclude that they go to weight and not exclusion.
Finally, THOIP argues that the Helfgott Survey and testimony should be excluded because Dr. Helfgott "destroyed and failed to produce [his notes and work papers] relied upon in forming his opinions."171 However, Dr. Helfgott testified that he only destroyed his notes after incorporating the information contained in them in his report.172 As such, THOIP has shown no prejudice and its request to exclude the survey on this ground is denied.
For the reasons set forth above, the Ford Survey is inadmissible and the Helfgott Survey is admissible. Accordingly, Disney's motion is granted and THOIP's motion is denied. The Clerk of the Court is directed to close these motions (document numbers 66 and 70). A conference is scheduled for February, 16, 2010 at 4 p.m.