focused crawler


English Wikipedia - The Free EncyclopediaDownload this dictionary
Focused crawler
A focused crawler is a web crawler that collects Web pages that satisfy some specific property, by carefully prioritizing the crawl frontier]and managing the hyperlink exploration process. Some predicates may be based on simple, deterministic and surface properties. For example, a crawler's mission may be to crawl pages from only the .jp domain. Other predicates may be softer or comparative, e.g., "crawl pages with large PageRank", or "crawl pages about baseball". An important page property pertains to topics, leading to topical crawlers. For example, a topical crawler may be deployed to collect pages about solar power, or swine flu, while minimizing resources spent fetching pages on other topics. Crawl frontier management may not be the only device used by focused crawlers; they may use a Web directory, a Web text indexbacklinks, or any other Web artifact.

See more at Wikipedia.org...


© This article uses material from Wikipedia® and is licensed under the GNU Free Documentation License and under the Creative Commons Attribution-ShareAlike License