A
focused crawler is a
web crawler that collects Web pages that satisfy some specific property, by carefully prioritizing the
crawl frontier]and managing the hyperlink exploration process. Some predicates may be based on simple, deterministic and surface properties. For example, a crawler's mission may be to crawl pages from only the .jp domain. Other predicates may be softer or comparative, e.g., "crawl pages with large
PageRank", or "crawl pages about baseball". An important page property pertains to topics, leading to
topical crawlers. For example, a topical crawler may be deployed to collect pages about solar power, or swine flu, while minimizing resources spent fetching pages on other topics. Crawl frontier management may not be the only device used by focused crawlers; they may use a
Web directory, a
Web text index,
backlinks, or any other Web artifact.