TrustRank, as defined in a whitepaper by Jan Pederson (Yahoo!), Hector Garcia-Molina (Stanford), and Zoltan Gyongyi (Stanford), located at
http://www.vldb.org/conf/2004/RS15P3.PDF, is a system of techniques to determine if a page is reputable or if it is spam. This system is not totally automated, as it does need some human intervention.
TrustRank is designed to help identify pages and sites that are likely to be spam or those that are likely to be reputable. The algorithm first selects a small seed set of pages which will be manually evaluated by humans. To select the seed sites, they use a form of Inverse PageRank, choosing sites that link out to many sites. Of those, many sites were removed, such as DMOZ clones, and sites that were not listed in major directories. The final set was culled down to include only selected sites with a strong authority (such as a governmental or educational institution or company) that controlled the contents of the site. Once the seed set is determined, a human examines each seed page, and rates it as either spam or reputable. The algorithm can now take this reviewed set of seed pages and rate other pages based on their connectivity with the trusted seed pages.
The authors of the TrustRank method assume that spam pages are built to fool search engines, rather than provide useful information. The authors also assume that trusted pages rarely point to spam pages, except in cases where they are tricked into it (such as users posting spam urls in a forum post).
The farther away a page is from a trusted page (via link structure), the less certain is the likelihood that the page is also trusted, with two or three steps away being the maximum. In other words, trust is reduced as the algorithm moves further and further away from the good seed pages. Several formulas are used to determine the amount of trust dampening or splitting to be assigned to each new page. Using these formulas, some portion of the trust level of a page is passed along to other pages to which it links.
TrustRank can be used alone to filter the index, or in combination with PageRank to determine search engine rankings.