Editing 2731: K-Means Clustering

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 15: Line 15:
 
A popular class of wry observations use the {{wiktionary|snowclone}} "There are two types of people in the world... those who do A, and those who do B". Here B will usually, though not always, be some antithesis of A. The most self-referent version is the joke "There are two types of people in the world - those who divide people into two types, and those who don't". Other well known versions include: "There are three types of people in the world - those who can count, and those who can't", "There are two types of people in the world - those who can extrapolate... ", and "There are 10 types of people in the world - those who understand binary and those who don't."
 
A popular class of wry observations use the {{wiktionary|snowclone}} "There are two types of people in the world... those who do A, and those who do B". Here B will usually, though not always, be some antithesis of A. The most self-referent version is the joke "There are two types of people in the world - those who divide people into two types, and those who don't". Other well known versions include: "There are three types of people in the world - those who can count, and those who can't", "There are two types of people in the world - those who can extrapolate... ", and "There are 10 types of people in the world - those who understand binary and those who don't."
  
βˆ’
Ponytail uses {{w|K-means_clustering|''k''-means clustering}} with k=3. This is a method of categorizing data. To explain how it works, imagine a set of people of various heights and weights, that should be split into 3 groups (which gives k the value 3). One way to do this would be to plot the data onto a scatter chart; then pick three points at random for reference; then sort the people according to which point they are closest to, forming 3 initial groups. After forming 3 groups, the average of the data point of every item in each group is found; these average data points are used as new reference points to once again categorize all the data into 3 new groups. This process is repeated until the data converges; that is, the data points no longer change groups even after new reference points are picked.
+
Ponytail uses {{w|K-means_clustering|''k''-means clustering}} with k=3. This is a method of categorizing data. To explain how it works, imagine a set of people of various heights and weights, that should be split into 3 groups (which gives k the value 3). One way to do this would be to plot the data onto a scatter chart; then pick three points at random for reference; then sort the people according to which point they are closest to, forming 3 initial groups. After forming 3 groups, the average of the data point of every item in each group is found; these average data points are used as new reference points to once again categorize all the data into 3 new groups. This process is repeated until the data converges; that is, the data point do no longer change groups even after new reference points are picked.
  
 
The ''k''-means algorithm is quite simple, which lends to its popularity, but it has a major drawback: the analyst has to determine how many groups (or clusters) to split the data into (that is, what to set k equal to). A value of k that doesn't match the underlying structure of data can yield a partitioning that's hard to explain in terms of properties that distinguish each cluster (in other words, their qualitative interpretation is unclear).
 
The ''k''-means algorithm is quite simple, which lends to its popularity, but it has a major drawback: the analyst has to determine how many groups (or clusters) to split the data into (that is, what to set k equal to). A value of k that doesn't match the underlying structure of data can yield a partitioning that's hard to explain in terms of properties that distinguish each cluster (in other words, their qualitative interpretation is unclear).

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)