Main Page

Explain xkcd: It's 'cause you're dumb.
Revision as of 05:04, 4 May 2022 by Davidy22 (talk | contribs) (This part definitely looks like it changed while I was away)
Jump to: navigation, search

Welcome to the explain xkcd wiki!
We have an explanation for all 2934 xkcd comics, and only 6 (0%) are incomplete. Help us finish them!

Latest comic

Go to this comic explanation

Bloom Filter
Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they're the right one you can never be sure.
Title text: Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they're the right one you can never be sure.

Explanation

Ambox notice.png This explanation may be incomplete or incorrect: PROBABLY CREATED - Please change this comment when editing this page. Do NOT delete this tag too soon.

The comic is referring to a Bloom Filter, a data structure that is used for approximate membership queries and cardinality estimation using a bounded amount of memory. That is, after a series of objects are added to the bloom filter, given another object, the bloom filter can be queried to see if that object has already been added to it, with a chance of a false positive answer that depends on the size of the bloom filter. Or, the bloom filter can be queried for an approximate count of the objects that have been added to the bloom filter already.

A bloom filter uses a large bit array, and a number of hashing functions that produce indexes into this array. When a value is added to the set, it's hashed with each function, and the corresponding bits in the array are set to 1. To test if a value is in the set you hash it with all the functions, and check if all the bits are 1. If they are, the value may be in the set, but there can also be false positives because each hash collides with some other value in the set (assuming reasonable hash functions, a different element for each hash). But if any of the bits is 0, you know for sure the value is not in the set. The higher the ratio between the size of the bit array and the number of elements in the set, the smaller the false positive rate is (10 bits/element has about 1% false positives.

The joke in the comic is that Cueball has a 1-bit Bloom filter. When the set is empty, it accurately reports that any value is not in the set. But as soon as anything is added to the set, it has a very large false positive rate, since that single bit will be set and everything will hash to that index. Similarly the cardinality estimation is (correctly) 0 initially, but after the first addition the estimate will be "somewhere between 1 and infinity" which is not a terribly useful estimate.

There's also no point in having multiple hash functions for a 1-bit filter, since there's only one possible hash value.

The title text carries the characteristics of the bloom filter into the decision making process for choosing a bloom filter over other candidate data structures. In an analogous way (according to the text), you can be sure when they are not the best approach, but only conclude that they are with a limited degree of probability.

Transcript

Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.
[Ponytail holds out her hand to Cueball, who is holding a paper with a 1 on it.]
Ponytail: Does your set contai-
Cueball: Yeah, probably.
[Caption below the panel:]
One-Bit Bloom Filter


Is this out of date? Clicking here will fix that.

New here?

Last 7 days (Top 10)

Lots of people contribute to make this wiki a success. Many of the recent contributors, listed above, have just joined. You can do it too! Create your account here.

You can read a brief introduction about this wiki at explain xkcd. Feel free to sign up for an account and contribute to the wiki! We need explanations for comics, characters, themes and everything in between. If it is referenced in an xkcd web comic, it should be here.

  • There are incomplete explanations listed here. Feel free to help out by expanding them!

Rules

Don't be a jerk.

There are a lot of comics that don't have set-in-stone explanations; feel free to put multiple interpretations in the wiki page for each comic.

If you want to talk about a specific comic, use its discussion page.

Please only submit material directly related to (and helping everyone better understand) xkcd... and of course only submit material that can legally be posted (and freely edited). Off-topic or other inappropriate content is subject to removal or modification at admin discretion, and users who repeatedly post such content will be blocked.

If you need assistance from an admin, post a message to the Admin requests board.