Difference between revisions of "3026: Linear Sort"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Explanation: all comparison sorting algorithms are Omega(n log n). Saying the best ones are O(n log n) doesn’t mean anything because any O(n) algorithm is also O(n log n))
(Complimentary fix)
Line 23: Line 23:
 
<li><b><i>O</i>(''n''<sup>2</sup>)</b> - Quadratic time, meaning the execution time grows proportionally to the ''square'' of the size of the data.</li></ul>
 
<li><b><i>O</i>(''n''<sup>2</sup>)</b> - Quadratic time, meaning the execution time grows proportionally to the ''square'' of the size of the data.</li></ul>
  
The code in the comic describes a 'linear' sort that first sorts the list using {{w|merge sort}}, which is known to take time O(''n'' log(''n'')), and then `sleep()`s (pauses with no activity) for a complimentary amount of time by subtracting the time taken for the sort from the number of elements multiplied by 1 million (1e6) seconds. This way, the total time always scales proportionately with the number of elements. This effectively converts the algorithm, through brute force, to fit the definition of linear time: it takes one million seconds — which is more than 11 days — per element, rather than a non-linear progression as the number of elements increases. Although this algorithm ''does''  run in O(''n''), it does not reflect that it is made to be significantly slower than the nominally 'worse' O(''n'' log(''n'')) performance that the embedded sort takes by itself.
+
The code in the comic describes a 'linear' sort that first sorts the list using {{w|merge sort}}, which is known to take time O(''n'' log(''n'')), and then `sleep()`s (pauses with no activity) for a complementary amount of time by subtracting the time taken for the sort from the number of elements multiplied by 1 million (1e6) seconds. This way, the total time always scales proportionately with the number of elements. This effectively converts the algorithm, through brute force, to fit the definition of linear time: it takes one million seconds — which is more than 11 days — per element, rather than a non-linear progression as the number of elements increases. Although this algorithm ''does''  run in O(''n''), it does not reflect that it is made to be significantly slower than the nominally 'worse' O(''n'' log(''n'')) performance that the embedded sort takes by itself.
  
 
It should be noted that for sufficiently large lists, the merge sort will take longer than the million seconds per element, which results in a negative value being passed to the sleep() function. This might halt the program with a runtime error, produce {{w|Integer overflow#Definition variations and ambiguity|unpredictably extra-long}} additional waits or skip any additional wait; all of these still leaving the issue of already having exceeded O(''n''). However, this issue will only arise for impossibly huge lists: if, for instance, a merge sort took ''n log(n)'' microseconds to complete (which would be considered slow, by today's typical processing times), then the comic's 'linear' sort target would be reached sooner only for lists longer than 2<sup>1,000,000,000,000</sup> ≈ 10<sup>300,000,000,000</sup> elements — a number far larger than the number of atoms in the universe. The practical impossibility of this outcome might be why such a ridiculously high number of seconds was chosen.
 
It should be noted that for sufficiently large lists, the merge sort will take longer than the million seconds per element, which results in a negative value being passed to the sleep() function. This might halt the program with a runtime error, produce {{w|Integer overflow#Definition variations and ambiguity|unpredictably extra-long}} additional waits or skip any additional wait; all of these still leaving the issue of already having exceeded O(''n''). However, this issue will only arise for impossibly huge lists: if, for instance, a merge sort took ''n log(n)'' microseconds to complete (which would be considered slow, by today's typical processing times), then the comic's 'linear' sort target would be reached sooner only for lists longer than 2<sup>1,000,000,000,000</sup> ≈ 10<sup>300,000,000,000</sup> elements — a number far larger than the number of atoms in the universe. The practical impossibility of this outcome might be why such a ridiculously high number of seconds was chosen.

Revision as of 08:23, 19 December 2024

Linear Sort
The best case is O(n), and the worst case is that someone checks why.
Title text: The best case is O(n), and the worst case is that someone checks why.

Explanation

Ambox notice.png This explanation may be incomplete or incorrect: Created in Θ(N) TIME by an iterative Insertion Sorter working on a multidimensional array - Please change this comment when editing this page. Do NOT delete this tag too soon.
If you can address this issue, please edit the page! Thanks.

A common task in programming is to sort a list, a list being a collection of related elements of data that are stored in a linear fashion. There are dozens of algorithms that have been created for this through the years, from simple to complex, and each has its own merits with regards to how easy it is to understand / implement, how much space it uses, and how efficiently it operates on the data.

In computer science, the runtime of an algorithm can be described using Big O Notation, which categories the asymptotic, usually average, runtime (O) of a function of the number of elements (n) operated on (f(n)) as it grows larger and larger towards infinity; this creates the form O(f(n)) as the final description. Being asymptotic means that Big O Notation only considers parts of the function that scale with time and disregards fixed changes such as multipliers and additions to the scaling time. For instance, in an O(n) algorithm, f(n) is simply n, meaning it takes a constant amount of time per element operated on. A simple example would be examining pictures: if it takes one second to look at a picture, it would take ten seconds to look at ten pictures; if it took three seconds to look at a picture, it would take thirty seconds to look at ten pictures - both are described as O(n).

Generally, programmers seek to minimize the Big O Notation of their algorithms because it means they take less time. It can be proven that all general-purpose sorting methods are Ω(n log n); since this is larger than n on its own, it means that algorithms will always begin to take longer per element as the number of elements increases.

Here are some examples of common runtimes expressed in Big O notation, from smallest to largest:

  • O(1) - Constant time, which means the execution time is independent of the size of the data
  • O(n) - Linear time, which means the execution time grows in direct proportion to the size of the data
  • O(n log(n)) - The execution time grows proportionally to n * the logarithm of n, with the added log(n) creating an increasingly larger multiplier on the runtime
  • O(n2) - Quadratic time, meaning the execution time grows proportionally to the square of the size of the data.

The code in the comic describes a 'linear' sort that first sorts the list using merge sort, which is known to take time O(n log(n)), and then `sleep()`s (pauses with no activity) for a complementary amount of time by subtracting the time taken for the sort from the number of elements multiplied by 1 million (1e6) seconds. This way, the total time always scales proportionately with the number of elements. This effectively converts the algorithm, through brute force, to fit the definition of linear time: it takes one million seconds — which is more than 11 days — per element, rather than a non-linear progression as the number of elements increases. Although this algorithm does run in O(n), it does not reflect that it is made to be significantly slower than the nominally 'worse' O(n log(n)) performance that the embedded sort takes by itself.

It should be noted that for sufficiently large lists, the merge sort will take longer than the million seconds per element, which results in a negative value being passed to the sleep() function. This might halt the program with a runtime error, produce unpredictably extra-long additional waits or skip any additional wait; all of these still leaving the issue of already having exceeded O(n). However, this issue will only arise for impossibly huge lists: if, for instance, a merge sort took n log(n) microseconds to complete (which would be considered slow, by today's typical processing times), then the comic's 'linear' sort target would be reached sooner only for lists longer than 21,000,000,000,000 ≈ 10300,000,000,000 elements — a number far larger than the number of atoms in the universe. The practical impossibility of this outcome might be why such a ridiculously high number of seconds was chosen.

The title text refers to the best and worst case of a sort, which are additional measures of its runtime to describe the shortest and longest potential times. A more optimal sort may decide how much of a list needs to be passed over again after its first pass of shuffling elements around; scanning a pre-sorted list (and deducing that it has no more checking to do) could mean that no more effort is needed, resulting in a best case of O(n). Depending upon the algorithm, presenting a list that is in an ordering that happens to challenge it the most (such as exactly reversed) may mean even an 'average O(n log n)' process would have to exceed this, resulting in a worst-case number of operations that may be O(n2). It can be very useful to know that a given sorting method may take the average order of time, but have the possibility of a much shorter or longer runtime... especially when the method is expected to be far, far worse than others, where only particular and more idealistic input lets it approach the more satisfyingly fast average/best responses.

By forcing all practical searches to take O(n) time, regardless of how otherwise identical data is presorted, the best case (and worst case, for that matter) will also be O(n). The last part of the text then plays on another meaning of best case and worst case, as best- and worst-case scenarios for a situation, by saying that the worst outcome for the code's author is when someone decides to investigate the code (perhaps owing to its absurd runtime, or else just justifiably skeptical of the declared optimality), whereupon that investigator will discover the deception and ruin the author's reputation.

Transcript

Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.
[The panel shows five lines of code:]
function LinearSort(list):
StartTime=Time()
MergeSort(list)
Sleep(1e6*length(list)-(Time()-StartTime))
return
[Caption below the panel:]
How to sort a list in linear time


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

First in linear time!Mr. I (talk) 13:28, 18 December 2024 (UTC)

Due to the fact that O(nlog(n)) outgrows O(n), the Linear Sort is not actually linear. 162.158.174.227 14:21, 18 December 2024 (UTC)

If your sleep() function can handle negative arguments "correctly", then I guess it could work. 162.158.91.91 16:27, 18 December 2024 (UTC)
It relies on 1 second being long enough to outcompete the maximum input length provided. The joke is that most sort operations that take an entire second or more are considered too slow to be worth doing. 02:30, 22 December 2024 (UTC)

That was fast... Caliban (talk) 15:35, 18 December 2024 (UTC)

Do I even want to know what Randall's thinking nowadays? ⯅A dream demon⯅ (talk) 16:02, 18 December 2024 (UTC)

Does anyone every want to know what Randall is thinking nowadays? :P 198.41.227.177 22:02, 19 December 2024 (UTC)

The title text would be more correct if Randall used e.g. Timsort instead of Mergesort. They both have the same worst-case complexity O(n*log(n)), but the former is linear if the list was already in order, so best-case complexity is O(n). Mergesort COULD also be implemented this way, but its standard version is never linear. Bebidek (talk) 16:35, 18 December 2024 (UTC)

According to my estimates extrapolated from timing the sorting of 10 million random numbers on my computer, the break-even point where the algorithm becomes worse than linear is beyond the expected heat death of the universe. I did neglect the question of where to store the input array. --162.158.154.35 16:37, 18 December 2024 (UTC)

If the numbers being sorted are unique, each would need a fair number of bits to store. (Fair meaning that the time to do the comparison would be non-negligible.) If they aren't, you can just bucket-sort them in linear time. Since we're assuming absurdly large memory capacity. 162.158.186.253 17:14, 18 December 2024 (UTC)

What system was the person writing the description using where Sleep(n) takes a parameter in whole seconds rather than the usual milliseconds? 172.70.216.162 17:20, 18 December 2024 (UTC)

First, I don't recognize the language, but sleep() takes seconds for python, C (et. al.), and no doubt many others. Second, the units don't have to be seconds, they just have to be whatever `TIME()` returns, and multiplicable by 1e6 to yield a "big enough" delay. Of course, no coefficient is big enough for this to actually be linear in theory for any size list, so who cares? To be truly accurate, sleep for `e^LENGTH(LIST)`, and it really won't much matter what the units are, as long as they're big enough for `SLEEP(e)` to exceed the difference in the time it takes to sort two items versus one item. Use a language-dependent coefficient as needed. Jlearman (talk) 18:02, 18 December 2024 (UTC)
Usual where, is that the Windows API? The sleep function in the POSIX standard takes seconds. See https://man7.org/linux/man-pages/man3/sleep.3.html . 162.158.62.194 18:57, 18 December 2024 (UTC)

If I had a nickel for every time I saw an O(n) sorting algorithm using "sleep"… But this one is actually different. The one I usually see feeds the to-be-sorted value into the sleep function, so it schedules "10" to be printed in 10 seconds, then schedules "3" to be printed in 3 seconds, etc., which would theoretically be linear time, if the sleep function was magic. Fabian42 (talk) 17:25, 18 December 2024 (UTC)

This comic also critiques/points out the pitfalls of measuring time complexity using Big-O notation, such as an algorithm or solution that runs in linear time still being too slow for its intended use case. Sophon (talk) 17:46, 18 December 2024 (UTC)

Current text is incorrect, but I'm not sure how best to express the correction -- there do exist O(n) sorting algorithms, they're just not general-purpose, since they don't work with an arbitrary comparison function. See counting sort. 172.69.134.151 18:25, 18 December 2024 (UTC)

Hi! I'm just gonna say this before everyone leaves and goes on their merry way. Significant comic numbers coming soon: Comics 3100, 3200, 3300, etc, Comic 3094 (The total number of frames in 'time'), Comic 4000, Comic Whatever the next April fools day comic will be, and Comic 4096. Wait for it...DollarStoreBa'al (talk) 20:42, 18 December 2024 (UTC)

Comic 3141.592654172.70.163.144 09:16, 19 December 2024 (UTC)

As everyone observed, the stated algorithm is not theoretically linear, but only practically linear (in that the time and space to detect O(n log n) exceeds reasonable (time, space) bounds for this universe). Munroe's solution is much deeper than that though - it trivially generalises to a _constant_ O(1) bound. [run a sort algorithm, wait 20 years, give the answer]. That's the preferred way of repaying loans, too. 172.69.195.27 (talk) 21:46, 18 December 2024 (UTC) (please sign your comments with ~~~~)

Continues comic 3017's theme of worst-case optimization. 172.70.207.115 00:32, 19 December 2024 (UTC)

It looks as though this function does not actually do the sort in Linear Time, it only returns in Linear Time. The MERGESORT Function itself looks to only take one parameter and does not have an obvious return value indicating that it performs an in-place sort on the input mutable list. This means that the list is sorted at the speed of the MERGESORT function, but flow control is only returned after Linear Time. For a single threaded program calling this function there is no practical difference, but it would make a difference if some other thread was concurrently querying the list. A clearer linear time sort might look like this:

 function LinearSort(list):
   StartTime=Time()
   SortedList=MergeSort(list)
   Sleep(1e6*length(list)-(Time()-StartTime))
   return SortedList

Leon 172.70.162.70 (talk) 17:31, 19 December 2024 (please sign your comments with ~~~~)

There's such a thing as pass-by-reference, variously implemented depending upon the actual programming language used. It's even possible to accept both list (non-reference, to force a return of sorted_list) and listRef (returns nothing, or perhaps a result such as number_of_shuffles made), for added usefulness, though of course that'd need even more pseudocode to describe. For the above/comic pseudocode, it's not so arbitrary that a programmer shouldn't know how to implement it in their instance.
I might even set about to do something like use a SetStartTime() and CheckElapsedTime() funtion, if there's possible use; the former making a persistant (private variable) note of what =Time() it is, perhaps to an arbitrary record scoped to any parameterID it is supplied, and the latter returning the 'now' time minus the stored (default or explicitly IDed) moment of record. I could then have freely pseudocoded the extant outline in even briefer format, on the understanding what these two poke/peek functions are. Which is already left open to the imagination for MergeSort(). 172.69.43.182 18:04, 19 December 2024 (UTC)

There are situations where you want to return in O(1) time or some other time that is not dependent on the input data to prevent side-channel data leaks. While the run-time of Randall's "O(n)" algorithm has an obvious dependencies on the input data, using the "Randall Algorithm" to obscure a different algorithm can reduce the side-channel opportunities. A more sure-fire way would be to have the algorithm return in precisely i seconds, where i is the number of seconds between now and the heat death of the universe. 172.71.167.89 17:49, 19 December 2024 (UTC)

Please write an explanation for non-programmers!

I don't understand this explainxkcd. The comic itself was less confusing. Can please someone who really gets this stuff write a section of the explanation that explains the joke to people like me who do not have a theoretical programming degree? I know that is a tall task but right now it reads as rambling and a bunch of 0(n) that makes no sense to me. I can cut and paste a bash script together and make it work. I can understand that putting a sleep for a million seconds in a loop somewhere makes it slow. But a layperson explanation of what makes a sort linear, what is linear, what is funny about that approach, would be better than all the arguing about 0(n) because we don't get it. Thanks in advance! You folks are awesome! 172.71.147.210 20:51, 19 December 2024 (UTC)

Maybe this would be a good start:
--cut here--
An algorithm is a step-by-step way of doing things.
A sorting algorithm is a step-by-step way to sort things.
There are several commonly used sorting algorithms. Some have very little "overhead" (think: set-up time or requiring lots of extra memory) or what I call "molassas" (yes, I just made that up) (think "taking a long time or lots of extra memory for each step") but they really bog down if you have a lot of things that need sorting. These are better if you have a small list of items to sort.
Others have more "overhead" or "molasses" but don't bog down as much when you have a lot of things that need sorting. These are better if you have a lot of things to sort.
A linear sorting algorithm would take twice as long to sort twice as many unsorted items. If it took 100 seconds to sort 100 items, then it would take 200 seconds to sort 200, 300 seconds to sort 300, and so on. Algorithms that take "twice as long to do twice as much" are said to run in "Order(n)" or "O(n)" time, where "n" is the number of items they are working on, or in the case of a sorting algorithm, the number of items to be sorted.
For traditional sorting algorithms that don't use "parallel processing" (that is, they don't do more than one thing in any given moment), a linear sorting algorithm with very little "overhead" or "molasses" would be the "holy grail" of sorting algorithms. For example, a hypothetical linear sorting algorithm that took 1/1000th of a second to "set things up" (low "overhead") and an additional 1 second to sort 1,000,000 numbers (not much "molasses") would be able to sort 2,000,000 numbers in just over 2 seconds, 10,000,000 numbers in just over 10 seconds, and 3,600,000,000 numbers in a hair over an hour.
The reality is that there is no such thing as a general-purpose linear sorting algorithm that has very little overhead (in both time and memory) and very little "molasses." All practical general-purpose sorting algorithms either use parallel processing, they have a lot of overhead (set-up time or uses lots of memory), a lot of "molasses" (takes a long time or uses lots of memory for EACH item in the list) or they are "slower than linear," which means they bog down when you give them a huge list of things to sort. For example, let's say the "mergesort" in Randall's algorithm doesn't have much "overhead" or "molasses" and it sorts 1,000,000 items in 1 second. It's time is "O(nlog(n))" which is a fancy way of saying if you double the number, you'll more than double the time. This means sorting 2,000,000 items will take more than 2 seconds, and sorting 4,000,000 items will take more than twice as long as it takes to sort 2,000,000. Eventually all of those "more than's" add up and things slow to a crawl.
The joke is that Randall "pretends" to be the "holy grail" by being a linear sorting algorithm, but he has lots of "molasses" because his linear sorting algorithm takes 1 million seconds for each item in the list, compared to the 1,000,000 items per second in the hypothetical "linear sorting algorithm" I proposed.
As others in the discussion point out, Randall's "algorithm" is "busted" (breaks, doesn't work, gives undefined results) if the mergesort (which is a very fast sort if you have a large list if items) is sorting a list so big that it takes over 1 million seconds per item to sort anyways. I'll spare you the math, but if the mergesort part of Randall's "algorithm" could do 1,000,000 numbers in 1 second with a 1/1000th of a second to "set things up," it would take a huge list to get it to "bust" Randall's "algorithm."
--cut here--
162.158.174.202 21:44, 19 December 2024 (UTC)
Layman's guide to O(n) time, second try:
--cut here--
First, "O" is "Order of" as in "order of magnitude." It's far from exact.
O(1) is "constant time" - the time it takes me to give you a bag that contains 5000 $1 bills doesn't depend on how many bills there are in the bag. It would take the same amount of time if the bag had only 500, 50, or even 5 bills in it.
O(log(n)) is "logarithmic time" - the time is the time it takes me to write down how many bills are in the bag. If it's 5000, I have to write down 4 digits, if it's 500, 3, if it's 50, 2, if it's 5, only 1.
O(n) is "linear time" - the time it takes me to count out each bill in the bag depends on how many bills there are. It takes a fixed amount of time to count each bill. If there's 5000 $1 bills it may take me 5000 seconds to count them. If there's 500 $1 bills, it will take me only 500 seconds.
O(nlog(n)) is "linear times logarithmic time" - the time it takes me to sort a pre-filled bag of money by serial number using a good general-purpose sorting algorithm (most good general-purpose sorting algorithms are O(nlog(n)) time). If it takes me 2 seconds to sort two $1 bills, it will take me about 3 or 4 times 5000 seconds to sort 5000 $1 bills. The "3 or 4" is very approximate, the important thing is that "logarithm of n" (in this case, logarithm of 5000) is big enough to make a difference (by a factor of 3 or 4 in this case) but far less than "n" (in this case, 5000).
O(n2) is "n squared" time, which is a special case of "polynomial time." "Polynomial time" includes things like O(n3) and O(n1,000,000). Many algorithms including many "naive" sorting algorithms are in this category. If I used a "naive" sorting algorithm to sort 5000 $1 bills by serial number, instead of it taking about 15,000-20,000 seconds, it would take about 5,000 times 5,000 seconds. I don't know about you, but I've got better things to do with 25,000,000 seconds than sort paper money.
It gets worse (O(2n) anyone? No thanks!), but you wanted to keep it simple.
198.41.227.177 23:30, 19 December 2024 (UTC)
Personally, I've got better things to do than sort dollar bills, full stop.172.70.91.130 09:37, 20 December 2024 (UTC)
O() notations is about behavior with large values, not small values. Try the "handing a bag of bills" algorithm with a few million dollar bills. You're going to need a forklift. Getting a forklift is not, in practice, instantaneous. Big N notation is almost always a joke for people trying to solve real problems. It only works on an abstract machine with some really weird (not physically achievable) properties. 162.158.155.141 20:54, 20 December 2024 (UTC)

Friendly reminder that some users of this site are just here to learn what the joke is, and not to read the entire Wikipedia article on Big O Notation. Perhaps the actual explanation could be moved up a bit, and some of the fiddly Big-O stuff could be moved down? I'd do it myself, but I'm not really sure which is which. 172.70.176.28 06:42, 20 December 2024 (UTC)

I mean, it is fairly fundamental to the joke, and therefore to the explanation. It might be possible to slim it down a bit, but I don't think you can explain the joke without some explanation of Big O.172.70.91.130 09:37, 20 December 2024 (UTC)

I've just come to the conclusion that I will never 100% understand 3026. Dogman15 (talk) 10:14, 20 December 2024 (UTC)

Tell me that again when you've actually tried the official process...
 function Understand(comic):
   StartTime=Time()
   ReadExplanation(comic)
   Sleep(1e12*length(comic)-(Time()-StartTime))
   return
172.70.162.56 11:10, 20 December 2024 (UTC)
The article should start off "This is a joke about Big-O notation and sorting algorithms, a topic in introductory computer science education." then continue with something like "An algorithm is computer code for solving a general problem. Big-O notation is a method for describing the efficiency of algorithms." and maybe something like "Randall has designed an algorithm that appears more efficient than commonly considered possible, claiming to solve a popular challenge of many decades, by trying to game how the Big-O approach to analysis ignores the real speed of an algorithm, instead considering how it changes when the data is changed." 172.68.54.209 02:43, 22 December 2024 (UTC)