Difference between revisions of "3026: Linear Sort"
(mention non-comparison-based sorting and the importance of the asymptotic runtime as n approaches infinity) |
(→Explanation: Dealing with bad 'markdown' and writing style. Though suspect it was intended (but not signed) as a Talk-page comment.) |
||
Line 35: | Line 35: | ||
By forcing all practical searches to take exactly the same time, regardless of the manner that the same data is provided for sorting, the best case and the worst case (in all practical cases) are also strictly O(''n''), for exactly the same contrived reason. For the programmer who conducted this deception, the real secret of the O(''n'') is that someone just takes the claim of a "linear sort" at face value, without worrying about why it is actually so slow. The actual worst case scenario is if someone becomes curious enough to investigate how the code works (perhaps once they discover it takes nearly twelve days per element) and uncovering the deceptive manner of the code. | By forcing all practical searches to take exactly the same time, regardless of the manner that the same data is provided for sorting, the best case and the worst case (in all practical cases) are also strictly O(''n''), for exactly the same contrived reason. For the programmer who conducted this deception, the real secret of the O(''n'') is that someone just takes the claim of a "linear sort" at face value, without worrying about why it is actually so slow. The actual worst case scenario is if someone becomes curious enough to investigate how the code works (perhaps once they discover it takes nearly twelve days per element) and uncovering the deceptive manner of the code. | ||
− | The same idea trivially generalises to a | + | The same idea trivially generalises to a ''constant'' O(1) bound for sorting. (That would be so much better than linear! Run a sort algorithm, wait until exactly 40 years after the query before the answer is passed on.) That's the preferred way of repaying loans, too. |
==Transcript== | ==Transcript== |
Revision as of 01:10, 19 December 2024
Linear Sort |
Title text: The best case is O(n), and the worst case is that someone checks why. |
Explanation
This explanation may be incomplete or incorrect: Created in Θ(N) TIME by an iterative Insertion Sorter working on a multidimensional array - Please change this comment when editing this page. Do NOT delete this tag too soon. If you can address this issue, please edit the page! Thanks. |
A common task in programming is to sort a list, a list being a collection of related elements of data that are stored in a linear fashion. There are dozens of algorithms that have been created for this through the years, from simple to complex, and each has its own merits with regards to how easy it is to understand / implement, how much space it uses, and how efficiently it operates on the data.
In computer science the runtime of an algorithm can be described using Big O Notation, which describes the asymptotic runtime (usually of an average case) as the number of elements operated on (notated as n) grows larger and larger. Specifically, O(f(n)) means that as n grows without bound, the runtime is less than f(n) times some constant value. For instance, an O(n) algorithm (here f(n) = n) would, as n grows large, take a fixed amount of time per element operated on, so the total time it takes would be a multiple of the size of the list. If it takes one second to look at a picture, it would take ten seconds to look at ten pictures. So "look at a list of pictures" is a linear operation and would be described as having runtime complexity O(n); this is described as 'linear' runtime. An algorithm described as O(n2) would take time less than or equal to some constant times n2.
Some examples of runtimes expressed in Big O notation:
- O(1) - Constant, which means the execution time is independent of the size of the data
- O(n) - Linear, which means the execution time varies in direct proportion to the size of the data
- O(n log n) - The execution time grows proportionally to n*log(n)
- O(n2) - Quadratic, meaning the execution time is proportional to the square of the size of the data.
Importantly, the Big O notation, mathematically, expresses the limiting behavior of the complexity as n approaches infinity, so any "special behavior" which happens only for limited values of n can be simply ignored.
In the most general case, sorting algorithms don't consider the concrete values which are sorted, instead, they utilize an abstract function which tells if one element is "before" another or not. It can be proven that the best general-purpose sorting methods are O(n log n). Other, more specific methods (such as Counting sort) can actually beat this limit. The comic, and the explanation below, focuses only on the general-purpose case.
Thus, a sorting algorithm in linear time does not actually exist; sorting is more complex. The time it takes to sort a list of items grows as you add more items to the list. The complexity of sorting algorithms generally ranges from O(n log n) to O(n2). For example, one way to sort is to look at all the values to find the smallest item, then look at all the values again to find the second item - the smallest one not found yet - and so on until you've positioned every item in the right place. If "looking at a number" takes one second, then you could sort a list of 2 numbers in 4 seconds: look at both numbers, then look at them a second time. Sorting 3 numbers would take 9 seconds: look at all 3 numbers 3 times to find the right position. Sorting a deck of cards this way would take 52*52 seconds = about 45 minutes. The amount of time it takes to sort a list grows faster the more items you look at. This is not the most efficient way to sort, but it gets the job done.
The 'linear' sort here first uses the merge sort, which takes time O(n log n) rather than the faster linear (i.e., O(n)) time. It does not matter, however, because it `sleep()`s for 1e6 (1 million) seconds, more than 11 days, per item in the list. It does this by sleeping for that length of time minus the time it actually took -- converting it by brute force to linear time that is so slow that it will not overcome the O(n log n) term for any dataset it will encounter. It's 'linear' because it's guaranteed to take a million seconds for every element in the list (regardless of how long the actual merge sort really took); doubling the list size doubles the number of millions of seconds it takes - at least for any imaginable list. No matter how inefficient the merge sort sort method is, it is very unlikely it would take longer than a million seconds per element. The joke is that although this algorithm runs in linear time, which is usually preferable to n log n time, the algorithm is so grotesquely slow that it would never be useful.
It should be noted that for sufficiently large lists, merge sort will take longer than the million seconds per element, at which point the sorting algorithm will stop being linear anyway. However, this issue will only arise for impossibly huge lists. If, for instance, merge sort took time n log(n) * 1 microsecond (slow by today's standards), then merge sort would take longer than the comic's 'linear' sort only for lists longer than 21,000,000,000,000 ≈ 10300,000,000,000 elements - a number far larger than the number of atoms in the universe. Mathematically speaking, though, the algorithm's complexity is actually O(n log n), since for sufficiently large n, the merge sort will dominate the run time.
The title text refers to the best and worst case of the sort, which are additional measures of the runtime of an algorithm. A given sort may only repeat a whole or partial scan of a list when a previous pass has identified a need to shuffle elements around; scanning a pre-sorted list just the once (and deducing that it has no work to do) results in a best case of O(n). Depending upon the algorithm, presenting a list that is reverse-sorted (or perhaps in some other specific ordering that happens to challenge it the most) may require even an 'average O(n log n)' process to take a worst-case number of operations of something like O(n²). It can be very useful to know that a given sorting method may take the average order of time, but have the possibility of a much shorter or longer runtime. An O(n) best case is particularly desirable when mostly expecting data that biases towards that result, but not all algorithms (or working situations) are capable of this. On the other hand, it may be more desirable to pursue an algorithm that is instead 'less worse' at handling worst-case orderings, or at least aren't expected to be far far worse than others, despite a better response under such rare ideal conditions.
By forcing all practical searches to take exactly the same time, regardless of the manner that the same data is provided for sorting, the best case and the worst case (in all practical cases) are also strictly O(n), for exactly the same contrived reason. For the programmer who conducted this deception, the real secret of the O(n) is that someone just takes the claim of a "linear sort" at face value, without worrying about why it is actually so slow. The actual worst case scenario is if someone becomes curious enough to investigate how the code works (perhaps once they discover it takes nearly twelve days per element) and uncovering the deceptive manner of the code.
The same idea trivially generalises to a constant O(1) bound for sorting. (That would be so much better than linear! Run a sort algorithm, wait until exactly 40 years after the query before the answer is passed on.) That's the preferred way of repaying loans, too.
Transcript
This transcript is incomplete. Please help editing it! Thanks. |
- [The panel shows five lines of code:]
- function LinearSort(list):
- StartTime=Time()
- MergeSort(list)
- Sleep(1e6*length(list)-(Time()-StartTime))
- return
- [Caption below the panel:]
- How to sort a list in linear time
Discussion
First in linear time!Mr. I (talk) 13:28, 18 December 2024 (UTC)
Due to the fact that O(nlog(n)) outgrows O(n), the Linear Sort is not actually linear. 162.158.174.227 14:21, 18 December 2024 (UTC)
- If your sleep() function can handle negative arguments "correctly", then I guess it could work. 162.158.91.91 16:27, 18 December 2024 (UTC)
- It relies on 1 second being long enough to outcompete the maximum input length provided. The joke is that most sort operations that take an entire second or more are considered too slow to be worth doing. 02:30, 22 December 2024 (UTC)
That was fast... Caliban (talk) 15:35, 18 December 2024 (UTC)
Do I even want to know what Randall's thinking nowadays? ⯅A dream demon⯅ (talk) 16:02, 18 December 2024 (UTC)
- Does anyone every want to know what Randall is thinking nowadays? :P 198.41.227.177 22:02, 19 December 2024 (UTC)
The title text would be more correct if Randall used e.g. Timsort instead of Mergesort. They both have the same worst-case complexity O(n*log(n)), but the former is linear if the list was already in order, so best-case complexity is O(n). Mergesort COULD also be implemented this way, but its standard version is never linear. Bebidek (talk) 16:35, 18 December 2024 (UTC)
According to my estimates extrapolated from timing the sorting of 10 million random numbers on my computer, the break-even point where the algorithm becomes worse than linear is beyond the expected heat death of the universe. I did neglect the question of where to store the input array. --162.158.154.35 16:37, 18 December 2024 (UTC)
- If the numbers being sorted are unique, each would need a fair number of bits to store. (Fair meaning that the time to do the comparison would be non-negligible.) If they aren't, you can just bucket-sort them in linear time. Since we're assuming absurdly large memory capacity. 162.158.186.253 17:14, 18 December 2024 (UTC)
What system was the person writing the description using where Sleep(n) takes a parameter in whole seconds rather than the usual milliseconds? 172.70.216.162 17:20, 18 December 2024 (UTC)
- First, I don't recognize the language, but sleep() takes seconds for python, C (et. al.), and no doubt many others. Second, the units don't have to be seconds, they just have to be whatever `TIME()` returns, and multiplicable by 1e6 to yield a "big enough" delay. Of course, no coefficient is big enough for this to actually be linear in theory for any size list, so who cares? To be truly accurate, sleep for `e^LENGTH(LIST)`, and it really won't much matter what the units are, as long as they're big enough for `SLEEP(e)` to exceed the difference in the time it takes to sort two items versus one item. Use a language-dependent coefficient as needed. Jlearman (talk) 18:02, 18 December 2024 (UTC)
- Usual where, is that the Windows API? The sleep function in the POSIX standard takes seconds. See https://man7.org/linux/man-pages/man3/sleep.3.html . 162.158.62.194 18:57, 18 December 2024 (UTC)
If I had a nickel for every time I saw an O(n) sorting algorithm using "sleep"… But this one is actually different. The one I usually see feeds the to-be-sorted value into the sleep function, so it schedules "10" to be printed in 10 seconds, then schedules "3" to be printed in 3 seconds, etc., which would theoretically be linear time, if the sleep function was magic. Fabian42 (talk) 17:25, 18 December 2024 (UTC)
This comic also critiques/points out the pitfalls of measuring time complexity using Big-O notation, such as an algorithm or solution that runs in linear time still being too slow for its intended use case. Sophon (talk) 17:46, 18 December 2024 (UTC)
Current text is incorrect, but I'm not sure how best to express the correction -- there do exist O(n) sorting algorithms, they're just not general-purpose, since they don't work with an arbitrary comparison function. See counting sort. 172.69.134.151 18:25, 18 December 2024 (UTC)
Hi! I'm just gonna say this before everyone leaves and goes on their merry way. Significant comic numbers coming soon: Comics 3100, 3200, 3300, etc, Comic 3094 (The total number of frames in 'time'), Comic 4000, Comic Whatever the next April fools day comic will be, and Comic 4096. Wait for it...DollarStoreBa'al (talk) 20:42, 18 December 2024 (UTC)
- Comic 3141.592654172.70.163.144 09:16, 19 December 2024 (UTC)
As everyone observed, the stated algorithm is not theoretically linear, but only practically linear (in that the time and space to detect O(n log n) exceeds reasonable (time, space) bounds for this universe). Munroe's solution is much deeper than that though - it trivially generalises to a _constant_ O(1) bound. [run a sort algorithm, wait 20 years, give the answer]. That's the preferred way of repaying loans, too. 172.69.195.27 (talk) 21:46, 18 December 2024 (UTC) (please sign your comments with ~~~~)
Continues comic 3017's theme of worst-case optimization. 172.70.207.115 00:32, 19 December 2024 (UTC)
It looks as though this function does not actually do the sort in Linear Time, it only returns in Linear Time. The MERGESORT Function itself looks to only take one parameter and does not have an obvious return value indicating that it performs an in-place sort on the input mutable list. This means that the list is sorted at the speed of the MERGESORT function, but flow control is only returned after Linear Time. For a single threaded program calling this function there is no practical difference, but it would make a difference if some other thread was concurrently querying the list. A clearer linear time sort might look like this:
function LinearSort(list): StartTime=Time() SortedList=MergeSort(list) Sleep(1e6*length(list)-(Time()-StartTime)) return SortedList
Leon 172.70.162.70 (talk) 17:31, 19 December 2024 (please sign your comments with ~~~~)
- There's such a thing as pass-by-reference, variously implemented depending upon the actual programming language used. It's even possible to accept both list (non-reference, to force a return of sorted_list) and listRef (returns nothing, or perhaps a result such as number_of_shuffles made), for added usefulness, though of course that'd need even more pseudocode to describe. For the above/comic pseudocode, it's not so arbitrary that a programmer shouldn't know how to implement it in their instance.
- I might even set about to do something like use a SetStartTime() and CheckElapsedTime() funtion, if there's possible use; the former making a persistant (private variable) note of what =Time() it is, perhaps to an arbitrary record scoped to any parameterID it is supplied, and the latter returning the 'now' time minus the stored (default or explicitly IDed) moment of record. I could then have freely pseudocoded the extant outline in even briefer format, on the understanding what these two poke/peek functions are. Which is already left open to the imagination for MergeSort(). 172.69.43.182 18:04, 19 December 2024 (UTC)
There are situations where you want to return in O(1) time or some other time that is not dependent on the input data to prevent side-channel data leaks. While the run-time of Randall's "O(n)" algorithm has an obvious dependencies on the input data, using the "Randall Algorithm" to obscure a different algorithm can reduce the side-channel opportunities. A more sure-fire way would be to have the algorithm return in precisely i seconds, where i is the number of seconds between now and the heat death of the universe. 172.71.167.89 17:49, 19 December 2024 (UTC)
- Please write an explanation for non-programmers!
I don't understand this explainxkcd. The comic itself was less confusing. Can please someone who really gets this stuff write a section of the explanation that explains the joke to people like me who do not have a theoretical programming degree? I know that is a tall task but right now it reads as rambling and a bunch of 0(n) that makes no sense to me. I can cut and paste a bash script together and make it work. I can understand that putting a sleep for a million seconds in a loop somewhere makes it slow. But a layperson explanation of what makes a sort linear, what is linear, what is funny about that approach, would be better than all the arguing about 0(n) because we don't get it. Thanks in advance! You folks are awesome! 172.71.147.210 20:51, 19 December 2024 (UTC)
- Maybe this would be a good start:
- --cut here--
- An algorithm is a step-by-step way of doing things.
- A sorting algorithm is a step-by-step way to sort things.
- There are several commonly used sorting algorithms. Some have very little "overhead" (think: set-up time or requiring lots of extra memory) or what I call "molassas" (yes, I just made that up) (think "taking a long time or lots of extra memory for each step") but they really bog down if you have a lot of things that need sorting. These are better if you have a small list of items to sort.
- Others have more "overhead" or "molasses" but don't bog down as much when you have a lot of things that need sorting. These are better if you have a lot of things to sort.
- A linear sorting algorithm would take twice as long to sort twice as many unsorted items. If it took 100 seconds to sort 100 items, then it would take 200 seconds to sort 200, 300 seconds to sort 300, and so on. Algorithms that take "twice as long to do twice as much" are said to run in "Order(n)" or "O(n)" time, where "n" is the number of items they are working on, or in the case of a sorting algorithm, the number of items to be sorted.
- For traditional sorting algorithms that don't use "parallel processing" (that is, they don't do more than one thing in any given moment), a linear sorting algorithm with very little "overhead" or "molasses" would be the "holy grail" of sorting algorithms. For example, a hypothetical linear sorting algorithm that took 1/1000th of a second to "set things up" (low "overhead") and an additional 1 second to sort 1,000,000 numbers (not much "molasses") would be able to sort 2,000,000 numbers in just over 2 seconds, 10,000,000 numbers in just over 10 seconds, and 3,600,000,000 numbers in a hair over an hour.
- The reality is that there is no such thing as a general-purpose linear sorting algorithm that has very little overhead (in both time and memory) and very little "molasses." All practical general-purpose sorting algorithms either use parallel processing, they have a lot of overhead (set-up time or uses lots of memory), a lot of "molasses" (takes a long time or uses lots of memory for EACH item in the list) or they are "slower than linear," which means they bog down when you give them a huge list of things to sort. For example, let's say the "mergesort" in Randall's algorithm doesn't have much "overhead" or "molasses" and it sorts 1,000,000 items in 1 second. It's time is "O(nlog(n))" which is a fancy way of saying if you double the number, you'll more than double the time. This means sorting 2,000,000 items will take more than 2 seconds, and sorting 4,000,000 items will take more than twice as long as it takes to sort 2,000,000. Eventually all of those "more than's" add up and things slow to a crawl.
- The joke is that Randall "pretends" to be the "holy grail" by being a linear sorting algorithm, but he has lots of "molasses" because his linear sorting algorithm takes 1 million seconds for each item in the list, compared to the 1,000,000 items per second in the hypothetical "linear sorting algorithm" I proposed.
- As others in the discussion point out, Randall's "algorithm" is "busted" (breaks, doesn't work, gives undefined results) if the mergesort (which is a very fast sort if you have a large list if items) is sorting a list so big that it takes over 1 million seconds per item to sort anyways. I'll spare you the math, but if the mergesort part of Randall's "algorithm" could do 1,000,000 numbers in 1 second with a 1/1000th of a second to "set things up," it would take a huge list to get it to "bust" Randall's "algorithm."
- --cut here--
- 162.158.174.202 21:44, 19 December 2024 (UTC)
- Layman's guide to O(n) time, second try:
- --cut here--
- First, "O" is "Order of" as in "order of magnitude." It's far from exact.
- O(1) is "constant time" - the time it takes me to give you a bag that contains 5000 $1 bills doesn't depend on how many bills there are in the bag. It would take the same amount of time if the bag had only 500, 50, or even 5 bills in it.
- O(log(n)) is "logarithmic time" - the time is the time it takes me to write down how many bills are in the bag. If it's 5000, I have to write down 4 digits, if it's 500, 3, if it's 50, 2, if it's 5, only 1.
- O(n) is "linear time" - the time it takes me to count out each bill in the bag depends on how many bills there are. It takes a fixed amount of time to count each bill. If there's 5000 $1 bills it may take me 5000 seconds to count them. If there's 500 $1 bills, it will take me only 500 seconds.
- O(nlog(n)) is "linear times logarithmic time" - the time it takes me to sort a pre-filled bag of money by serial number using a good general-purpose sorting algorithm (most good general-purpose sorting algorithms are O(nlog(n)) time). If it takes me 2 seconds to sort two $1 bills, it will take me about 3 or 4 times 5000 seconds to sort 5000 $1 bills. The "3 or 4" is very approximate, the important thing is that "logarithm of n" (in this case, logarithm of 5000) is big enough to make a difference (by a factor of 3 or 4 in this case) but far less than "n" (in this case, 5000).
- O(n2) is "n squared" time, which is a special case of "polynomial time." "Polynomial time" includes things like O(n3) and O(n1,000,000). Many algorithms including many "naive" sorting algorithms are in this category. If I used a "naive" sorting algorithm to sort 5000 $1 bills by serial number, instead of it taking about 15,000-20,000 seconds, it would take about 5,000 times 5,000 seconds. I don't know about you, but I've got better things to do with 25,000,000 seconds than sort paper money.
- It gets worse (O(2n) anyone? No thanks!), but you wanted to keep it simple.
- 198.41.227.177 23:30, 19 December 2024 (UTC)
- Personally, I've got better things to do than sort dollar bills, full stop.172.70.91.130 09:37, 20 December 2024 (UTC)
- O() notations is about behavior with large values, not small values. Try the "handing a bag of bills" algorithm with a few million dollar bills. You're going to need a forklift. Getting a forklift is not, in practice, instantaneous. Big N notation is almost always a joke for people trying to solve real problems. It only works on an abstract machine with some really weird (not physically achievable) properties. 162.158.155.141 20:54, 20 December 2024 (UTC)
Friendly reminder that some users of this site are just here to learn what the joke is, and not to read the entire Wikipedia article on Big O Notation. Perhaps the actual explanation could be moved up a bit, and some of the fiddly Big-O stuff could be moved down? I'd do it myself, but I'm not really sure which is which. 172.70.176.28 06:42, 20 December 2024 (UTC)
- I mean, it is fairly fundamental to the joke, and therefore to the explanation. It might be possible to slim it down a bit, but I don't think you can explain the joke without some explanation of Big O.172.70.91.130 09:37, 20 December 2024 (UTC)
I've just come to the conclusion that I will never 100% understand 3026. Dogman15 (talk) 10:14, 20 December 2024 (UTC)
- Tell me that again when you've actually tried the official process...
function Understand(comic): StartTime=Time() ReadExplanation(comic) Sleep(1e12*length(comic)-(Time()-StartTime)) return
- 172.70.162.56 11:10, 20 December 2024 (UTC)
- The article should start off "This is a joke about Big-O notation and sorting algorithms, a topic in introductory computer science education." then continue with something like "An algorithm is computer code for solving a general problem. Big-O notation is a method for describing the efficiency of algorithms." and maybe something like "Randall has designed an algorithm that appears more efficient than commonly considered possible, claiming to solve a popular challenge of many decades, by trying to game how the Big-O approach to analysis ignores the real speed of an algorithm, instead considering how it changes when the data is changed." 172.68.54.209 02:43, 22 December 2024 (UTC)