Difference between revisions of "Main Page"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Made main page title bigger cuz it looked wierd.)
Line 10: Line 10:
 
remain. '''[[Help:How to add a new comic explanation|Add yours]]''' while there's a chance!
 
remain. '''[[Help:How to add a new comic explanation|Add yours]]''' while there's a chance!
 
</center>
 
</center>
 
 
== Latest comic ==
 
== Latest comic ==
 
<div style="border:1px solid grey; background:#eee; padding:1em;">
 
<div style="border:1px solid grey; background:#eee; padding:1em;">

Revision as of 05:35, 4 March 2013

Welcome to the explain xkcd wiki!

We have collaboratively explained 6 xkcd comics, and only 2928 (48800%) remain. Add yours while there's a chance!

Latest comic

Go to this comic explanation

Bloom Filter
Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they're the right one you can never be sure.
Title text: Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they're the right one you can never be sure.

Explanation

Ambox notice.png This explanation may be incomplete or incorrect: PROBABLY CREATED - Please change this comment when editing this page. Do NOT delete this tag too soon.

The comic is about a data structure called a Bloom filter. Software engineers use Bloom filters to check if something is in a set or estimate how many things are in that set, using limited memory. One example: the Chrome web browser used to store a Bloom filter of URLs that were known to be malicious[1], based on a database that was too large to store locally. Chrome used that Bloom filter to confirm that it didn’t need to warn the user that they were visiting a malicious page. Only in the rare cases that the Bloom filter said the URL might be malicious, Chrome would send the URL to an external service to confirm it was known to be malicious. The developers didn’t want the browser to send every URL to the external service because that would leak the user’s entire browsing history to the service and would add an unnecessary network delay whenever a web page was loaded.

Here's how it works:

  1. Adding items: When you add an item, it gets hashed (a way of transforming it into numbers) by several hash functions. These hash functions mark certain spots in a big array of bits (think of it as a row of lights that can be on or off).
  2. Checking items: To check if an item is in the set, you hash it with the same functions and see if all the corresponding spots are lit up. If they are, the item might be in the set, but there's a chance of a false positive (the Bloom filter could mistakenly say the item is there when it’s not). If any spot is not lit up, the item is definitely not in the set.
  3. False positives: The larger the array compared to the number of items, the lower the chance of false positives. For example, 10 bits per item gives about a 1% false positive rate.
  4. Counting items: By analysing the activated bits, with appropriate calculations, you can derive an estimate of how many individual items are 'stored' for confirmation within the array. This estimate's accuracy will depend upon several factors, but more array-bits (making themselves potentially available to 'remember' each item) will be one of the most important ones when it comes to narrowing down the likelihood.

In the comic, Cueball has a 1-bit Bloom filter, which is almost useless. When empty, it correctly says nothing is in the set. But as soon as one item is added, the bit is set to 1, and now it falsely says every possible item is in the set. Its size estimate also becomes "between 1 and infinity," which isn’t helpful.

Having multiple hash functions is pointless for a 1-bit filter since they all end up pointing to the same single bit.

The title text carries the characteristics of the Bloom filter into the decision making process for choosing a Bloom filter over other candidate data structures. In an analogous way (according to the text), you can be sure when they are not the best approach, but only conclude that they are with a limited degree of probability.

Transcript

Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.
[Ponytail holds out her hand to Cueball, who is holding a paper with a 1 on it.]
Ponytail: Does your set contai-
Cueball: Yeah, probably.
[Caption below the panel:]
One-Bit Bloom Filter


Is this out of date? Clicking here will fix that.

New here?

You can read a brief introduction about this wiki at explain xkcd. Feel free to sign up for an account and contribute to the wiki! We need explanations for comics, characters, themes, memes and everything in between. If it is referenced in an xkcd web comic, it should be here.

  • List of all comics contains a complete table of all xkcd comics so far and the corresponding explanations. The red links (like this) are missing explanations. Feel free to help out by creating them! Here's how.

Rules

Don't be a jerk. There are a lot of comics that don't have set in stone explanations; feel free to put multiple interpretations in the wiki page for each comic.

If you want to talk about a specific comic, use its discussion page.

Please only submit material directly related to —and helping everyone better understand— xkcd... and of course only submit material that can legally be posted (and freely edited.) Off-topic or other inappropriate content is subject to removal or modification at admin discretion, and users who repeatedly post such content will be blocked.

If you need assistance from an admin, feel free to leave a message on their personal discussion page. The list of admins is here.
  1. Chromium Issue 10896048: Transition safe browsing from bloom filter to prefix set. (Closed) – https://chromiumcodereview.appspot.com/10896048/