CS100 Lecture Notes - Lecture 10: Regular Graph, Directed Graph, Financial Institution

51 views4 pages

Document Summary

Spiders/crawlers/robots: computer program that starts at one website and explore links to other websites: will explore various website and try out the links, collect indexed information that a user can search. Indexing the web: focus, can have a spider that is very focused on a certain topic. Some websites require subscriptions to access: relationships between different companies and web spiders to allow google to see what is on the page, dynamic content, may change its content based on who is viewing it. Spider may see the webpage differently from the user: query strings, does the spider need to check every page on every website. Index should include all variations with or without accents. stop words: the, it, is, do not compile every occurrence of these words, too overwhelming, occur too frequently, word variants, e. g. sell, sells, selling, sold, resell, resold, unsold, etc, needs to be treated as the same word.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents