HEALTHCARE ADVOCATES v. HARDING, EARLEY, FOLLMER
497 F.Supp.2d 627 (2007)
United States District Court, E.D. Pennsylvania.
July 20, 2007.
In this civil action, Healthcare Advocates alleges that the Harding firm's use of the Wayback Machine to obtain archived screenshots constituted "hacking." While the word hacking is not defined in the Complaint, Healthcare Advocates claims that the Harding firm manipulated the Wayback Machine on July 9, 2003, and July 14, 2003, in a way that rendered useless a protective measure that it had employed on its website. The protective measure at issue was a robots.txt file.
Healthcare Advocates placed this file on its website as a means of preventing the public from accessing archived screenshots of www.healtheareadvocates.com that were present on Internet Archives' database. Healthcare Advocates believes that the robots.txt file acted like a digital padlock. Since the Harding firm did not have the "key," Healthcare Advocates argues that they could only have obtained these protected images by breaking the robots.txt "lock."
By way of background, the Internet Archive is a nonprofit organization that has created an online library of digital media in an effort to preserve digital content for future reference. Its digital database is equivalent to a paper library, but is filled with digital media like websites instead of books. The library includes a collection of chronological records of various websites which Internet Archive makes available at no cost to the public via the Wayback Machine. The library's records include more than 85 billion screenshots of web pages which are stored on a computer database in California. Internet Archive's database provides users with the ability to study websites that may have been changed or no longer exist.
The chronological records are compiled by routinely taking screenshots of websites as they exist on various days. Internet Archive collects images through a process called crawling. A crawler or robot is an automated program that scours the Internet and takes pictures of every web page that it is instructed to visit. The most widely recognized use of screenshots is for indexing by search engines. Through indexing, search engines such as Google create lists of websites. These lists allow the search engine to provide faster searches, because the sites are all cataloged in the search engine's memory which negates the need to access the web to compile search results. A crawler provides the new screenshots Internet Archive uses to complete its chronologies.
Any person with a web browser can search Internet Archive's database of archived images. Searching the database is accomplished via the Wayback Machine, which Internet Archive provides on its website. The Wayback Machine is an information retrieval system that allows the user to request archived screenshots of web pages that may be contained on the database, and it is easy to use. First, a person logs onto Internet Archive's website located at www.archive.org, where the user will see a box in the middle of the homepage bearing the title "Wayback Machine." In the box there is a small input field. The user enters the web address of the desired site into the input field, following the http:// prompt, and hits the "Take Me Back" button found directly below to initiate a search of Internet Archive's database.
If screenshots matching the user's web address request are available, a list of the dates on which images were taken is displayed on the user's computer screen in vertical columns grouped by year. Clicking on a particular date retrieves the screenshots of the website archived for that specific date. The image appears in the user's web browser just like a live website would appear, however, the user is not viewing a live website. Instead, the user sees the static version of the website that is stored in Internet Archive's database. The Wayback Machine only provides a window into the past where users can see what a website looked like on a specific date.
The creators of the archives seek only to include publicly available websites in their library. Websites that require passwords for access are neither included nor crawled. Website owners who do not want their sites preserved in the database can
request to be excluded. Internet Archives has an exclusion policy in place that accommodates these requests, i.e. the robots.txt protocol.
The robots.txt file would not control access to Healthcare Advocate's website, it only controlled the information that was available once the website was accessed.