CS100 Lecture Notes - Lecture 10: Polysemy

73 views3 pages

Document Summary

Spiders, crawlers, or robots are computer programs that explore website links on the web and collect information. Focus: spiders can be really good at targeting certain topics and gathering information. Politeness: spiders can be very cooperative with other websites (not all spiders are like this) Revisit frequency: how often do you come back to a webpage. Paywalls: subscription is required, websites open backdoors for spiders to enter. Dynamic content: content changes depending on who is viewing it. Query strings: extra identifying information we provide to a webpage to say what content we want to see. There are many things that web searches need to worry about including: List of occurrences: where, when, and how many times a word appears in a website. Punctuation & hyphens: e-mail vs. email, many variations of the same word. Foreign languages & accents: beyonce vs. beyonc (cid:498)stop(cid:499) words: the, it, is (no point in indexing) Word variants: sell, sells, selling, sold, resell, resold, and unsold.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents