CS100 Lecture Notes - Lecture 10: Bow Tie

40 views2 pages

Document Summary

Crawl: to follow automatically (hyper)links on the world wide web or a particular web site to retrieve documents, typically for the purpose of indexing. After creating a postings list hyperlinks need to be extracted to be read and indexed. Crawlers create queues to make sure no hyperlinks are missed. Crawlers start from pages that are known to be good seeds (e. g. www. yahoo. com) According to web dragons the structure of the web resembles a bow tie. Deep web: collection of data stored on pages without html not accessible to many web crawlers. Sink page: has no links to other pages. The size of web is measured by the number of indexed pages. The total size of the web is estimated by comparing coverage and overlaps. Search engines employ multiple crawlers simultaneously to gather more pages/hour. Crawlers revisit web pages periodically of indexes remain correct. To have a webpage discovered as a page it needs to be linked into an existing page.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents