3026: Linear Sort

Explain xkcd: It's 'cause you're dumb.
Revision as of 14:49, 18 December 2024 by Asdf (talk | contribs) (Transcript: for screen readers?)
Jump to: navigation, search
Linear Sort
The best case is O(n), and the worst case is that someone checks why.
Title text: The best case is O(n), and the worst case is that someone checks why.

Explanation

Ambox notice.png This explanation may be incomplete or incorrect: Created in Θ(N) TIME by a recursive Bubble Sorter working on a multidimensional array - Please change this comment when editing this page. Do NOT delete this tag too soon.
If you can address this issue, please edit the page! Thanks.

A common task in programming is to sort a list, a list being a collection of related elements of data that are stored in a linear fashion. There are dozens of algorithms that have been created through the years, from simple to complex, and each has its own merits with regards to how easy it is to understand / implement vs. how efficient it operates on the data.

The efficiency of an algorithm is measured in terms of O(), commonly referred to as "Big-O", which classifies the amount of time needed to execute the algorithm with respect to the size of the data. Specifically, the Big-O assignment describes the change in execution time when the size of the data set changes (typically, when it doubles). There are various measures of efficiency, such as:

  • O(1) - Constant, which means the execution time is independent of the size of the data
  • O(n) - Linear, which means the execution time varies in direct proportion to the size of the data
  • O(n log n) - "n log n", usually assigned to the fastest sorting algorithms
  • O(n2) - Quadratic", meaning the execution time is proportional to the square of the size of the data.

An algorithm that sorts data in Linear time would be a fantastic discovery, as the best general-purpose sorting methods are "n log n".

In computer science, the complexity of a problem can be described using Big O Notation. Operations generally take longer when they act on more elements (notated as "n"). A linear algorithm would be very simple: each element would take a short amount of time on its own, so the time it takes would be a multiple of the size of the list. For instance, if it takes one second to look at a picture, it would take ten seconds to look at ten pictures. So "look at a list of pictures" is a linear operation and would be described as having complexity O(n).

Sorting is more complex. The time it takes to sort a list of items grows quickly as you add more items to the list. The complexity of sorting algorithms generally ranges from O(n^2) to O(n*log n). For example, one way to sort is to look at all the values to find the first item, then look at all the values to find the second item, and so on until you've positioned every item in the right place. If "looking at a number" takes one second, then you could sort a list of 2 numbers in 4 seconds: look at both numbers, then look at them a second time. Sorting 3 numbers would take 9 seconds: look at all 3 numbers 3 times to find the right position. Sorting a deck of cards this way would take 52*52 seconds = about 45 minutes. You can probably read a card more quickly than that, but the point is that the amount of time it takes to sort a list grows faster the more items you are looking at. This is not the most efficient way to sort, but it gets the job done.

The 'linear' sort here uses the more efficient merge sort, which is linearithmic, taking O(n * log n) time. To give the illusion of taking linear time, it tries to make sure it takes 1e6 units of time (likely milliseconds) per item in the list by sleeping for that length of time minus the time it actually took. This will appear to be linear time for up to very large values of n, since linearithmic time will take a huge value of n to exceed the time buffer given by the `sleep()`.

In other words, sorting 20 items will take 10 times as long as sorting 2 items, because after quickly sorting the two items, you just sit around wasting a huge amount of time until it is as slow as sorting a longer list.

The title text refers to the best and worst case of the sort. The best, average and worst case of a merge sort is O(n log n), but this 'linear' sort is pretending that the best case is O(n). The title text then treats 'worst case' as the worst case for the creator of the algorithm (i.e. someone finds out that the sort isn't actually linear), rather than the worst case of the algorithm.

Transcript

Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.
[The panel shows five lines of code:]
function LinearSort(list):
StartTime=Time()
MergeSort(list)
Sleep(1e6*length(list)-(Time()-StartTime))
return
[Caption below the panel:]
How to sort a list in linear time


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

First in linear time!Mr. I (talk) 13:28, 18 December 2024 (UTC)

Due to the fact that O(nlog(n)) outgrows O(n), the Linear Sort is not actually linear. 162.158.174.227 14:21, 18 December 2024 (UTC)

If your sleep() function can handle negative arguments "correctly", then I guess it could work. 162.158.91.91 16:27, 18 December 2024 (UTC)
It relies on 1 second being long enough to outcompete the maximum input length provided. The joke is that most sort operations that take an entire second or more are considered too slow to be worth doing. 02:30, 22 December 2024 (UTC)

That was fast... Caliban (talk) 15:35, 18 December 2024 (UTC)

Do I even want to know what Randall's thinking nowadays? ⯅A dream demon⯅ (talk) 16:02, 18 December 2024 (UTC)

Does anyone every want to know what Randall is thinking nowadays? :P 198.41.227.177 22:02, 19 December 2024 (UTC)

The title text would be more correct if Randall used e.g. Timsort instead of Mergesort. They both have the same worst-case complexity O(n*log(n)), but the former is linear if the list was already in order, so best-case complexity is O(n). Mergesort COULD also be implemented this way, but its standard version is never linear. Bebidek (talk) 16:35, 18 December 2024 (UTC)

According to my estimates extrapolated from timing the sorting of 10 million random numbers on my computer, the break-even point where the algorithm becomes worse than linear is beyond the expected heat death of the universe. I did neglect the question of where to store the input array. --162.158.154.35 16:37, 18 December 2024 (UTC)

If the numbers being sorted are unique, each would need a fair number of bits to store. (Fair meaning that the time to do the comparison would be non-negligible.) If they aren't, you can just bucket-sort them in linear time. Since we're assuming absurdly large memory capacity. 162.158.186.253 17:14, 18 December 2024 (UTC)

What system was the person writing the description using where Sleep(n) takes a parameter in whole seconds rather than the usual milliseconds? 172.70.216.162 17:20, 18 December 2024 (UTC)

First, I don't recognize the language, but sleep() takes seconds for python, C (et. al.), and no doubt many others. Second, the units don't have to be seconds, they just have to be whatever `TIME()` returns, and multiplicable by 1e6 to yield a "big enough" delay. Of course, no coefficient is big enough for this to actually be linear in theory for any size list, so who cares? To be truly accurate, sleep for `e^LENGTH(LIST)`, and it really won't much matter what the units are, as long as they're big enough for `SLEEP(e)` to exceed the difference in the time it takes to sort two items versus one item. Use a language-dependent coefficient as needed. Jlearman (talk) 18:02, 18 December 2024 (UTC)
Usual where, is that the Windows API? The sleep function in the POSIX standard takes seconds. See https://man7.org/linux/man-pages/man3/sleep.3.html . 162.158.62.194 18:57, 18 December 2024 (UTC)

If I had a nickel for every time I saw an O(n) sorting algorithm using "sleep"… But this one is actually different. The one I usually see feeds the to-be-sorted value into the sleep function, so it schedules "10" to be printed in 10 seconds, then schedules "3" to be printed in 3 seconds, etc., which would theoretically be linear time, if the sleep function was magic. Fabian42 (talk) 17:25, 18 December 2024 (UTC)

This comic also critiques/points out the pitfalls of measuring time complexity using Big-O notation, such as an algorithm or solution that runs in linear time still being too slow for its intended use case. Sophon (talk) 17:46, 18 December 2024 (UTC)

Current text is incorrect, but I'm not sure how best to express the correction -- there do exist O(n) sorting algorithms, they're just not general-purpose, since they don't work with an arbitrary comparison function. See counting sort. 172.69.134.151 18:25, 18 December 2024 (UTC)

Hi! I'm just gonna say this before everyone leaves and goes on their merry way. Significant comic numbers coming soon: Comics 3100, 3200, 3300, etc, Comic 3094 (The total number of frames in 'time'), Comic 4000, Comic Whatever the next April fools day comic will be, and Comic 4096. Wait for it...DollarStoreBa'al (talk) 20:42, 18 December 2024 (UTC)

Comic 3141.592654172.70.163.144 09:16, 19 December 2024 (UTC)

As everyone observed, the stated algorithm is not theoretically linear, but only practically linear (in that the time and space to detect O(n log n) exceeds reasonable (time, space) bounds for this universe). Munroe's solution is much deeper than that though - it trivially generalises to a _constant_ O(1) bound. [run a sort algorithm, wait 20 years, give the answer]. That's the preferred way of repaying loans, too. 172.69.195.27 (talk) 21:46, 18 December 2024 (UTC) (please sign your comments with ~~~~)

Continues comic 3017's theme of worst-case optimization. 172.70.207.115 00:32, 19 December 2024 (UTC)

It looks as though this function does not actually do the sort in Linear Time, it only returns in Linear Time. The MERGESORT Function itself looks to only take one parameter and does not have an obvious return value indicating that it performs an in-place sort on the input mutable list. This means that the list is sorted at the speed of the MERGESORT function, but flow control is only returned after Linear Time. For a single threaded program calling this function there is no practical difference, but it would make a difference if some other thread was concurrently querying the list. A clearer linear time sort might look like this:

 function LinearSort(list):
   StartTime=Time()
   SortedList=MergeSort(list)
   Sleep(1e6*length(list)-(Time()-StartTime))
   return SortedList

Leon 172.70.162.70 (talk) 17:31, 19 December 2024 (please sign your comments with ~~~~)

There's such a thing as pass-by-reference, variously implemented depending upon the actual programming language used. It's even possible to accept both list (non-reference, to force a return of sorted_list) and listRef (returns nothing, or perhaps a result such as number_of_shuffles made), for added usefulness, though of course that'd need even more pseudocode to describe. For the above/comic pseudocode, it's not so arbitrary that a programmer shouldn't know how to implement it in their instance.
I might even set about to do something like use a SetStartTime() and CheckElapsedTime() funtion, if there's possible use; the former making a persistant (private variable) note of what =Time() it is, perhaps to an arbitrary record scoped to any parameterID it is supplied, and the latter returning the 'now' time minus the stored (default or explicitly IDed) moment of record. I could then have freely pseudocoded the extant outline in even briefer format, on the understanding what these two poke/peek functions are. Which is already left open to the imagination for MergeSort(). 172.69.43.182 18:04, 19 December 2024 (UTC)

There are situations where you want to return in O(1) time or some other time that is not dependent on the input data to prevent side-channel data leaks. While the run-time of Randall's "O(n)" algorithm has an obvious dependencies on the input data, using the "Randall Algorithm" to obscure a different algorithm can reduce the side-channel opportunities. A more sure-fire way would be to have the algorithm return in precisely i seconds, where i is the number of seconds between now and the heat death of the universe. 172.71.167.89 17:49, 19 December 2024 (UTC)

Please write an explanation for non-programmers!

I don't understand this explainxkcd. The comic itself was less confusing. Can please someone who really gets this stuff write a section of the explanation that explains the joke to people like me who do not have a theoretical programming degree? I know that is a tall task but right now it reads as rambling and a bunch of 0(n) that makes no sense to me. I can cut and paste a bash script together and make it work. I can understand that putting a sleep for a million seconds in a loop somewhere makes it slow. But a layperson explanation of what makes a sort linear, what is linear, what is funny about that approach, would be better than all the arguing about 0(n) because we don't get it. Thanks in advance! You folks are awesome! 172.71.147.210 20:51, 19 December 2024 (UTC)

Maybe this would be a good start:
--cut here--
An algorithm is a step-by-step way of doing things.
A sorting algorithm is a step-by-step way to sort things.
There are several commonly used sorting algorithms. Some have very little "overhead" (think: set-up time or requiring lots of extra memory) or what I call "molassas" (yes, I just made that up) (think "taking a long time or lots of extra memory for each step") but they really bog down if you have a lot of things that need sorting. These are better if you have a small list of items to sort.
Others have more "overhead" or "molasses" but don't bog down as much when you have a lot of things that need sorting. These are better if you have a lot of things to sort.
A linear sorting algorithm would take twice as long to sort twice as many unsorted items. If it took 100 seconds to sort 100 items, then it would take 200 seconds to sort 200, 300 seconds to sort 300, and so on. Algorithms that take "twice as long to do twice as much" are said to run in "Order(n)" or "O(n)" time, where "n" is the number of items they are working on, or in the case of a sorting algorithm, the number of items to be sorted.
For traditional sorting algorithms that don't use "parallel processing" (that is, they don't do more than one thing in any given moment), a linear sorting algorithm with very little "overhead" or "molasses" would be the "holy grail" of sorting algorithms. For example, a hypothetical linear sorting algorithm that took 1/1000th of a second to "set things up" (low "overhead") and an additional 1 second to sort 1,000,000 numbers (not much "molasses") would be able to sort 2,000,000 numbers in just over 2 seconds, 10,000,000 numbers in just over 10 seconds, and 3,600,000,000 numbers in a hair over an hour.
The reality is that there is no such thing as a general-purpose linear sorting algorithm that has very little overhead (in both time and memory) and very little "molasses." All practical general-purpose sorting algorithms either use parallel processing, they have a lot of overhead (set-up time or uses lots of memory), a lot of "molasses" (takes a long time or uses lots of memory for EACH item in the list) or they are "slower than linear," which means they bog down when you give them a huge list of things to sort. For example, let's say the "mergesort" in Randall's algorithm doesn't have much "overhead" or "molasses" and it sorts 1,000,000 items in 1 second. It's time is "O(nlog(n))" which is a fancy way of saying if you double the number, you'll more than double the time. This means sorting 2,000,000 items will take more than 2 seconds, and sorting 4,000,000 items will take more than twice as long as it takes to sort 2,000,000. Eventually all of those "more than's" add up and things slow to a crawl.
The joke is that Randall "pretends" to be the "holy grail" by being a linear sorting algorithm, but he has lots of "molasses" because his linear sorting algorithm takes 1 million seconds for each item in the list, compared to the 1,000,000 items per second in the hypothetical "linear sorting algorithm" I proposed.
As others in the discussion point out, Randall's "algorithm" is "busted" (breaks, doesn't work, gives undefined results) if the mergesort (which is a very fast sort if you have a large list if items) is sorting a list so big that it takes over 1 million seconds per item to sort anyways. I'll spare you the math, but if the mergesort part of Randall's "algorithm" could do 1,000,000 numbers in 1 second with a 1/1000th of a second to "set things up," it would take a huge list to get it to "bust" Randall's "algorithm."
--cut here--
162.158.174.202 21:44, 19 December 2024 (UTC)
Layman's guide to O(n) time, second try:
--cut here--
First, "O" is "Order of" as in "order of magnitude." It's far from exact.
O(1) is "constant time" - the time it takes me to give you a bag that contains 5000 $1 bills doesn't depend on how many bills there are in the bag. It would take the same amount of time if the bag had only 500, 50, or even 5 bills in it.
O(log(n)) is "logarithmic time" - the time is the time it takes me to write down how many bills are in the bag. If it's 5000, I have to write down 4 digits, if it's 500, 3, if it's 50, 2, if it's 5, only 1.
O(n) is "linear time" - the time it takes me to count out each bill in the bag depends on how many bills there are. It takes a fixed amount of time to count each bill. If there's 5000 $1 bills it may take me 5000 seconds to count them. If there's 500 $1 bills, it will take me only 500 seconds.
O(nlog(n)) is "linear times logarithmic time" - the time it takes me to sort a pre-filled bag of money by serial number using a good general-purpose sorting algorithm (most good general-purpose sorting algorithms are O(nlog(n)) time). If it takes me 2 seconds to sort two $1 bills, it will take me about 3 or 4 times 5000 seconds to sort 5000 $1 bills. The "3 or 4" is very approximate, the important thing is that "logarithm of n" (in this case, logarithm of 5000) is big enough to make a difference (by a factor of 3 or 4 in this case) but far less than "n" (in this case, 5000).
O(n2) is "n squared" time, which is a special case of "polynomial time." "Polynomial time" includes things like O(n3) and O(n1,000,000). Many algorithms including many "naive" sorting algorithms are in this category. If I used a "naive" sorting algorithm to sort 5000 $1 bills by serial number, instead of it taking about 15,000-20,000 seconds, it would take about 5,000 times 5,000 seconds. I don't know about you, but I've got better things to do with 25,000,000 seconds than sort paper money.
It gets worse (O(2n) anyone? No thanks!), but you wanted to keep it simple.
198.41.227.177 23:30, 19 December 2024 (UTC)
Personally, I've got better things to do than sort dollar bills, full stop.172.70.91.130 09:37, 20 December 2024 (UTC)
O() notations is about behavior with large values, not small values. Try the "handing a bag of bills" algorithm with a few million dollar bills. You're going to need a forklift. Getting a forklift is not, in practice, instantaneous. Big N notation is almost always a joke for people trying to solve real problems. It only works on an abstract machine with some really weird (not physically achievable) properties. 162.158.155.141 20:54, 20 December 2024 (UTC)

Friendly reminder that some users of this site are just here to learn what the joke is, and not to read the entire Wikipedia article on Big O Notation. Perhaps the actual explanation could be moved up a bit, and some of the fiddly Big-O stuff could be moved down? I'd do it myself, but I'm not really sure which is which. 172.70.176.28 06:42, 20 December 2024 (UTC)

I mean, it is fairly fundamental to the joke, and therefore to the explanation. It might be possible to slim it down a bit, but I don't think you can explain the joke without some explanation of Big O.172.70.91.130 09:37, 20 December 2024 (UTC)

I've just come to the conclusion that I will never 100% understand 3026. Dogman15 (talk) 10:14, 20 December 2024 (UTC)

Tell me that again when you've actually tried the official process...
 function Understand(comic):
   StartTime=Time()
   ReadExplanation(comic)
   Sleep(1e12*length(comic)-(Time()-StartTime))
   return
172.70.162.56 11:10, 20 December 2024 (UTC)
The article should start off "This is a joke about Big-O notation and sorting algorithms, a topic in introductory computer science education." then continue with something like "An algorithm is computer code for solving a general problem. Big-O notation is a method for describing the efficiency of algorithms." and maybe something like "Randall has designed an algorithm that appears more efficient than commonly considered possible, claiming to solve a popular challenge of many decades, by trying to game how the Big-O approach to analysis ignores the real speed of an algorithm, instead considering how it changes when the data is changed." 172.68.54.209 02:43, 22 December 2024 (UTC)