CS100 Lecture Notes - Lecture 10: Bow Tie

40 views2 pages

violetmuskrat368

18 Sep 2015

School

University of Waterloo

Department

Computer Science

Course

CS100

Professor

Catherine Briggs

For unlimited access to Class Notes, a Class+ subscription is required.

Document Summary

Crawl: to follow automatically (hyper)links on the world wide web or a particular web site to retrieve documents, typically for the purpose of indexing. After creating a postings list hyperlinks need to be extracted to be read and indexed. Crawlers create queues to make sure no hyperlinks are missed. Crawlers start from pages that are known to be good seeds (e. g. www. yahoo. com) According to web dragons the structure of the web resembles a bow tie. Deep web: collection of data stored on pages without html not accessible to many web crawlers. Sink page: has no links to other pages. The size of web is measured by the number of indexed pages. The total size of the web is estimated by comparing coverage and overlaps. Search engines employ multiple crawlers simultaneously to gather more pages/hour. Crawlers revisit web pages periodically of indexes remain correct. To have a webpage discovered as a page it needs to be linked into an existing page.

CS100 Lecture Notes - Lecture 10: Bow Tie

Document Summary

Get access

Related Documents

CS100 Lecture Notes - Lecture 10: Web Search Engine, Web Crawler, Directed Graph

CS100 Lecture Notes - Terra Incognita, Bow Tie, Robots Exclusion Standard

Media, Information and Technoculture 3010E Study Guide - Midterm Guide: Google Search, Google Shopping, Adwords