Difference between revisions of "1667: Algorithms"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Deleting reference to East-West schism; nothing in the comic suggests this.)
 
(23 intermediate revisions by 12 users not shown)
Line 10: Line 10:
 
An algorithm is a basic set of instructions for performing a task, usually on a computer. This comic lists some algorithms in increasing order of complexity, where complexity may refer to either {{w|computational complexity theory}} (a formal mathematical account of the {{w|computational resource}}s – primarily computation time and memory space – required to solve a given problem), or the more informal notion of {{w|programming complexity}} (roughly, a measure of the number and degrees of internal dependencies and interactions within a piece of software).
 
An algorithm is a basic set of instructions for performing a task, usually on a computer. This comic lists some algorithms in increasing order of complexity, where complexity may refer to either {{w|computational complexity theory}} (a formal mathematical account of the {{w|computational resource}}s – primarily computation time and memory space – required to solve a given problem), or the more informal notion of {{w|programming complexity}} (roughly, a measure of the number and degrees of internal dependencies and interactions within a piece of software).
  
At the simplest end is '''left-pad''', or adding filler characters on the left end of a string to make it a particular length. In many programming languages, this is one line of code. This is possibly an allusion to a [http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-program/ recent incident] when {{w|Npm (software)|NodeJS Package Manager}} [https://www.techdirt.com/articles/20160324/17160034007/namespaces-intellectual-property-dependencies-big-giant-mess.shtml angered a developer] in its handling of a trademark claim. The developer unpublished all of his modules from NPM, including a package implementing left-pad. A huge number of programs depended on this third-party library instead of programming it on their own, and they immediately ceased to function.
+
At the simplest end is '''left-pad''', or adding filler characters on the left end of a string to make it a particular length. In many programming languages, this is one line of code. This is possibly an allusion to a [http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-program/ incident] when {{w|Npm (software)|NodeJS Package Manager}} [https://www.techdirt.com/articles/20160324/17160034007/namespaces-intellectual-property-dependencies-big-giant-mess.shtml angered a developer] in its handling of a trademark claim. The developer unpublished all of his modules from NPM, including a package implementing left-pad. A huge number of programs depended on this third-party library instead of programming it on their own, and they immediately ceased to function.
  
 
'''{{w|Quicksort}}''' is an efficient and commonly used {{w|sorting algorithm}}.
 
'''{{w|Quicksort}}''' is an efficient and commonly used {{w|sorting algorithm}}.
Line 16: Line 16:
 
'''{{w|Git (software)|Git}}''' is a {{w|version control}} program, i.e., software that allows multiple people to work on the same files at the same time. When someone finalizes ("commits") their changes, the version control program needs to join the new content with the existing content. When more than one person has made overlapping changes at the same time, the process of figuring out how to join them is called '''{{w|Merge (version control)|merging}}''', and the algorithm for it is anything but simple.
 
'''{{w|Git (software)|Git}}''' is a {{w|version control}} program, i.e., software that allows multiple people to work on the same files at the same time. When someone finalizes ("commits") their changes, the version control program needs to join the new content with the existing content. When more than one person has made overlapping changes at the same time, the process of figuring out how to join them is called '''{{w|Merge (version control)|merging}}''', and the algorithm for it is anything but simple.
  
A '''{{w|self-driving car}}''' is an automobile with sensors and software built into it so that it can maneuver in traffic autonomously, i.e. without a human controller. Various companies have been working on such vehicles for many years now, and while they're further along now than would have been imaginable even a couple of years ago, we're still far away from the dream of hopping in a driver-less taxi and sitting back as the car itself navigates to where we want to be. Recently [[Randall]] has made several references to self-driving cars, for instance in [[1559: Driving]],[[1623: 2016 Conversation Guide]] and [[1625: Substitutions 2]].
+
A '''{{w|self-driving car}}''' is an automobile with sensors and software built into it so that it can maneuver in traffic autonomously, i.e. without a human controller. Vehicles that require zero user-guidance on the roads were commonly predicted to become widely available in the 2010s, but the algorithms required to handle any possible road and weather situation has proven much more complex than expected. [[Randall]] made several [[:Category:Self-driving cars|references to self-driving cars]] in the mid-2010s.
  
The '''{{w|Google Search}} backend''' is what enables you to type "what the heck is a leftpad algorithm" into your browser and have Google return a list of relevant results, including correcting "leftpad" to "left-pad", ignoring the "what the heck" part, and sometimes even summarizing the findings into a box at the top of the results. Behind all that magic is a way to remember what pages the Internet contains, which is just a mind-bogglingly large quantity of data, and an even more mind-numbingly complex set of algorithms for processing that data.
+
The '''{{w|Google Search}} backend''' is what enables you to type "what the heck is a leftpad algorithm" into your browser and have Google return a list of relevant results, including correcting "leftpad" to "left-pad", truncating "what the heck is" to simply "what is", and sometimes even summarizing the findings into a box at the top of the results. Behind all that magic is a way to remember what pages the Internet contains, which is just a mind-bogglingly large quantity of data, and an even more [[1605: DNA|mind-numbingly complex]] set of algorithms for processing that data.
  
 
The last item is the punchline: a sprawling {{w|Microsoft Excel|Excel}} {{w|spreadsheet}} built up over 20 years by a church group in Nebraska to coordinate their scheduling. Spreadsheets are a general {{w|end-user development}} programming technique, and therefore people use Excel for all sorts of purposes that have nothing to do with accounting (its original purpose), including one guy who made a [http://arstechnica.com/gaming/2013/04/how-an-accountant-created-an-entire-rpg-inside-an-excel-spreadsheet/ role-playing game that runs in Excel]; but even that doesn't approach the complexity that develops when multiple people of varying levels of experience use a spreadsheet over many years for the purpose of coordinating the schedule of several coordinated groups.
 
The last item is the punchline: a sprawling {{w|Microsoft Excel|Excel}} {{w|spreadsheet}} built up over 20 years by a church group in Nebraska to coordinate their scheduling. Spreadsheets are a general {{w|end-user development}} programming technique, and therefore people use Excel for all sorts of purposes that have nothing to do with accounting (its original purpose), including one guy who made a [http://arstechnica.com/gaming/2013/04/how-an-accountant-created-an-entire-rpg-inside-an-excel-spreadsheet/ role-playing game that runs in Excel]; but even that doesn't approach the complexity that develops when multiple people of varying levels of experience use a spreadsheet over many years for the purpose of coordinating the schedule of several coordinated groups.
Line 24: Line 24:
 
The scheduling of tasks over a group of resources (a.k.a. the ''{{w|nurse scheduling problem}}''), while respecting the constraints set by each person, is a {{w|NP-hardness|highly complex}} problem requiring stochastic or heuristic methods for its resolution. Here, the algorithm would be further complicated by being solved by inexpert users over a spreadsheet model without using engineering practices. The potential hyperbole here is in thinking that such combination of circumstances would produce complexity far over that required to drive a car or sort the public contents of the Internet. While most churches meet mainly on Sunday morning, scheduling of what happens during the service when (especially if there are multiple concurrent services) as well as Sunday School, church business meetings, and congregation-wide events all potentially needing to be scheduled on a particular Sunday morning, the need to find a solution very close to the best possible solution quickly becomes a dire need. Furthermore, with different members involved in a wide variety of activities within and outside of the church, and the classrooms available to the church on Sunday itself, (just scheduling the choir practice times to coordinate with everyone's work schedules is very possibly impossible, especially if two people share the same occupation, and one is the relief for the other,) can indeed be daunting. In addition, there would likely be assorted committee meetings and youth groups during the week.
 
The scheduling of tasks over a group of resources (a.k.a. the ''{{w|nurse scheduling problem}}''), while respecting the constraints set by each person, is a {{w|NP-hardness|highly complex}} problem requiring stochastic or heuristic methods for its resolution. Here, the algorithm would be further complicated by being solved by inexpert users over a spreadsheet model without using engineering practices. The potential hyperbole here is in thinking that such combination of circumstances would produce complexity far over that required to drive a car or sort the public contents of the Internet. While most churches meet mainly on Sunday morning, scheduling of what happens during the service when (especially if there are multiple concurrent services) as well as Sunday School, church business meetings, and congregation-wide events all potentially needing to be scheduled on a particular Sunday morning, the need to find a solution very close to the best possible solution quickly becomes a dire need. Furthermore, with different members involved in a wide variety of activities within and outside of the church, and the classrooms available to the church on Sunday itself, (just scheduling the choir practice times to coordinate with everyone's work schedules is very possibly impossible, especially if two people share the same occupation, and one is the relief for the other,) can indeed be daunting. In addition, there would likely be assorted committee meetings and youth groups during the week.
  
In the title text, part of the spreadsheet's complexity is described as originating from different versions of the file for different programs. The words used like {{w|schism}} and {{w|sect}} are normally used in context of religions splitting into groups about differences in beliefs. In this case, the split seems to have been not over a {{w|theology|theological}} issue, but about the use of {{w|open-source software|open-source}} vs. {{w|proprietary software|proprietary}} software, disagreements about which are often compared to religious debates. Most likely, the schism being referred to is the {{w|East–West Schism|East-West Schism of 1054}}.
+
In the title text, part of the spreadsheet's complexity is described as originating from different versions of the file for different programs. The words used like {{w|schism}} and {{w|sect}} are normally used in context of religions splitting into groups about differences in beliefs. In this case, the split seems to have been not over a {{w|theology|theological}} issue, but about the use of {{w|open-source software|open-source}} vs. {{w|proprietary software|proprietary}} software, disagreements about which are often compared to religious debates.
  
 
The title text also implies that while trying to reconcile after the schism and to merge the two schedules they reinvented an alternative to Git within the spreadsheet itself, making the algorithms in place at least as complicated as that. Since most spreadsheet programs have a sort algorithm built in, that aspect is implied too, and left-padding could be compared to vamping on an introduction to a hymn. This would indicate that the other milestones of complexity are either included in the current version of the spreadsheet or are planned to be implemented.
 
The title text also implies that while trying to reconcile after the schism and to merge the two schedules they reinvented an alternative to Git within the spreadsheet itself, making the algorithms in place at least as complicated as that. Since most spreadsheet programs have a sort algorithm built in, that aspect is implied too, and left-padding could be compared to vamping on an introduction to a hymn. This would indicate that the other milestones of complexity are either included in the current version of the spreadsheet or are planned to be implemented.
Line 42: Line 42:
  
 
{{comic discussion}}
 
{{comic discussion}}
 +
 
[[Category:Charts]]
 
[[Category:Charts]]
 
[[Category:Google Search]]
 
[[Category:Google Search]]
 
[[Category:Programming]]
 
[[Category:Programming]]
 
[[Category:Religion]]
 
[[Category:Religion]]
 +
[[Category:Self-driving cars]]
 +
[[Category:Spreadsheets]]
 +
[[Category:Version Control]]

Latest revision as of 15:04, 23 August 2024

Algorithms
There was a schism in 2007, when a sect advocating OpenOffice created a fork of Sunday.xlsx and maintained it independently for several months. The efforts to reconcile the conflicting schedules led to the reinvention, within the cells of the spreadsheet, of modern version control.
Title text: There was a schism in 2007, when a sect advocating OpenOffice created a fork of Sunday.xlsx and maintained it independently for several months. The efforts to reconcile the conflicting schedules led to the reinvention, within the cells of the spreadsheet, of modern version control.

Explanation[edit]

An algorithm is a basic set of instructions for performing a task, usually on a computer. This comic lists some algorithms in increasing order of complexity, where complexity may refer to either computational complexity theory (a formal mathematical account of the computational resources – primarily computation time and memory space – required to solve a given problem), or the more informal notion of programming complexity (roughly, a measure of the number and degrees of internal dependencies and interactions within a piece of software).

At the simplest end is left-pad, or adding filler characters on the left end of a string to make it a particular length. In many programming languages, this is one line of code. This is possibly an allusion to a incident when NodeJS Package Manager angered a developer in its handling of a trademark claim. The developer unpublished all of his modules from NPM, including a package implementing left-pad. A huge number of programs depended on this third-party library instead of programming it on their own, and they immediately ceased to function.

Quicksort is an efficient and commonly used sorting algorithm.

Git is a version control program, i.e., software that allows multiple people to work on the same files at the same time. When someone finalizes ("commits") their changes, the version control program needs to join the new content with the existing content. When more than one person has made overlapping changes at the same time, the process of figuring out how to join them is called merging, and the algorithm for it is anything but simple.

A self-driving car is an automobile with sensors and software built into it so that it can maneuver in traffic autonomously, i.e. without a human controller. Vehicles that require zero user-guidance on the roads were commonly predicted to become widely available in the 2010s, but the algorithms required to handle any possible road and weather situation has proven much more complex than expected. Randall made several references to self-driving cars in the mid-2010s.

The Google Search backend is what enables you to type "what the heck is a leftpad algorithm" into your browser and have Google return a list of relevant results, including correcting "leftpad" to "left-pad", truncating "what the heck is" to simply "what is", and sometimes even summarizing the findings into a box at the top of the results. Behind all that magic is a way to remember what pages the Internet contains, which is just a mind-bogglingly large quantity of data, and an even more mind-numbingly complex set of algorithms for processing that data.

The last item is the punchline: a sprawling Excel spreadsheet built up over 20 years by a church group in Nebraska to coordinate their scheduling. Spreadsheets are a general end-user development programming technique, and therefore people use Excel for all sorts of purposes that have nothing to do with accounting (its original purpose), including one guy who made a role-playing game that runs in Excel; but even that doesn't approach the complexity that develops when multiple people of varying levels of experience use a spreadsheet over many years for the purpose of coordinating the schedule of several coordinated groups.

The scheduling of tasks over a group of resources (a.k.a. the nurse scheduling problem), while respecting the constraints set by each person, is a highly complex problem requiring stochastic or heuristic methods for its resolution. Here, the algorithm would be further complicated by being solved by inexpert users over a spreadsheet model without using engineering practices. The potential hyperbole here is in thinking that such combination of circumstances would produce complexity far over that required to drive a car or sort the public contents of the Internet. While most churches meet mainly on Sunday morning, scheduling of what happens during the service when (especially if there are multiple concurrent services) as well as Sunday School, church business meetings, and congregation-wide events all potentially needing to be scheduled on a particular Sunday morning, the need to find a solution very close to the best possible solution quickly becomes a dire need. Furthermore, with different members involved in a wide variety of activities within and outside of the church, and the classrooms available to the church on Sunday itself, (just scheduling the choir practice times to coordinate with everyone's work schedules is very possibly impossible, especially if two people share the same occupation, and one is the relief for the other,) can indeed be daunting. In addition, there would likely be assorted committee meetings and youth groups during the week.

In the title text, part of the spreadsheet's complexity is described as originating from different versions of the file for different programs. The words used like schism and sect are normally used in context of religions splitting into groups about differences in beliefs. In this case, the split seems to have been not over a theological issue, but about the use of open-source vs. proprietary software, disagreements about which are often compared to religious debates.

The title text also implies that while trying to reconcile after the schism and to merge the two schedules they reinvented an alternative to Git within the spreadsheet itself, making the algorithms in place at least as complicated as that. Since most spreadsheet programs have a sort algorithm built in, that aspect is implied too, and left-padding could be compared to vamping on an introduction to a hymn. This would indicate that the other milestones of complexity are either included in the current version of the spreadsheet or are planned to be implemented.

Transcript[edit]

Algorithms
By Complexity

More complex →
Leftpad Quicksort GIT
Merge
Self-
driving
car
Google
Search
backend
Sprawling Excel spreadsheet
built up over 20 years by a
church group in Nebraska to
coordinate their scheduling


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

How can an excel spreadsheet be complicated? 108.162.244.85 04:52, 13 April 2016 (UTC

See this example http://arstechnica.com/gaming/2013/04/how-an-accountant-created-an-entire-rpg-inside-an-excel-spreadsheet/ 108.162.216.82 (talk) (please sign your comments with ~~~~)
Oh my The Twenty-second. The Not So Only. The Nathan/Nk22 (talk) 10:36, 13 April 2016 (UTC)
also http://www.geocities.jp/nchikada/pac/ (it's geocities!) --141.101.98.84 11:56, 13 April 2016 (UTC)
Or the thing I use at work. Work in finance - reconciling 1 code takes half a GIGABYTE of spreadsheets every 4 weeks that had been added to and tweaked so many times in the last 5 years by 6 different people, 5 of which are no longer at the company (and the last one hasn't used it in over a year) that it took me 4 months just to understand how the damned things worked and went together, and my work asks me to teach internal excel classes and run drop ins to help people out. First thing I did when I understood it was tear it down and rebuild from the ground up. Didn't cut the number of sheets (actually ended up with more), but it now takes 3 days not 15 and has dropped from half a gig to about 270MB thanks to a major cleanup in the one file I kept. 141.101.70.103 (talk) (please sign your comments with ~~~~)

Leftpad is a reference to the recent incident where a developer unpublished all his libraries from the NodeJS Package Manager, causing much disruption: http://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/ 162.158.85.231 05:58, 13 April 2016 (UTC)

In the off chance that this is referencing an actual spreadsheet, and if anyone has a link, please post it in my talk page. (And in the article of course, but talk page first) Mikemk (talk) 06:45, 13 April 2016 (UTC)

The remark about quicksort's efficiency doesn't make sense. It's still the most common and practical general sorting algorithm. It's about as efficient you can typically get except in specialized cases or with some specific type of data. Should be removed imo. 141.101.81.121 08:52, 13 April 2016 (UTC)

From Wikipedia: Quicksort (sometimes called partition-exchange sort) is an efficient sorting algorithm,
From explainxkcd: Next is quicksort, a classic (if not very efficient) way to sort a list of items [1].
Demro (talk) 12:34, 13 April 2016 (UTC)

Added a [Citation Needed] for the excel based RPG. More so I can read about it/play it than anything else.. Xseo (talk) 09:07, 13 April 2016 (UTC)

Thank you whoever put that in Xseo (talk) 11:54, 13 April 2016 (UTC)

Added a bit to the church line. Just because you only see what happens on Sunday morning, for one hour, doesn't mean there's not more happening just beneath the surface. The classroom list at our church looks like a professional buildings office directory, and I know of members having to choose between two activities because both meet, or practice, at the same time. For instance, I know of a prospective AV team member who will never be a full time AV member, because she's a Soprano and already in Bells. (AV is setting up and debugging while the choir is practicing, and naturally it's hard to run a mixer or video switcher from the choir loft.)

Mind you, it's still hyperbole, but not to the degree previously given in the explanation. -- Sean Roach (talk) (please sign your comments with ~~~~)

is anyone else concerned that randall doesn't label his axis? is it logarithmic? --141.101.98.84 11:56, 13 April 2016 (UTC)

Is the Nebraskan excel sheet a loose reference to how you couldn't initially order Windows in Nebraska (from what I can gather), or am I over-analyzing this? (https://www.youtube.com/watch?v=fqKqQmSHkEg at 0:57) 141.101.98.85 12:18, 13 April 2016 (UTC)

Maybe it's a 'flat' joke. Elvenivle (talk) 15:48, 13 April 2016 (UTC)


Interestingly enough, when I actually searched for "what the heck is a leftpad algorithm" sans quotations, google didn't pull up any results at all.Kirdneh (talk) 14:24, 13 April 2016 (UTC)

Huh, that's strange. Someone had better find a query that actually works and edit the article accordingly.
Also, the page told me, "In order to show you the most relevant results, we have omitted some entries very similar to the 0 already displayed. If you like, you can repeat the search with the omitted results included." I clicked the link to repeat it and still, nothing showed up. Just goes to show that even Google's algorithms can have small flaws and quirks. ~AgentMuffin
Weird, when I try that query I get nearly 100k results. (The first now being this site) Tahg (talk) 23:07, 13 April 2016 (UTC)

Were .xlsx spreadsheets even around in 2007? Wasn't that final x added to the extension with office 2010? 108.162.249.158 20:30, 13 April 2016 (UTC)

xlsx was introduced with Office 2007: https://en.wikipedia.org/wiki/Microsoft_Excel#Current_file_extensions Elektrizikekswerk (talk) 07:18, 14 April 2016 (UTC)

Might it additionally be not just the intrinsic complexity but the perceived complexity? Bigger spreadsheets like this easily become arcane and incomprehensible, esp with a general lack of comments or explanations of what cells and formulas mean or how they were reached, especially with multiple contributors. 172.68.58.137 23:35, 11 August 2017 (UTC)RJ K.

My organization has used Excel as a scheduling tool for at least 10 years. Management was using paper and an enterprising employee with a penchant for Excel decided to kick them into the computer age. Each hour of each day of a 24/7 operation is tracked, including which of 13 task positions is being worked by which employee. But it must contain room for other than regularly scheduled employees as well as future additions. And each Workbook covers 2 weeks at a stretch. All information must come together to generate several reports. So... 13 positions x a theoretical 100 employees = 1300 cells. Times 14 days = 18,200 cells. And each cell has a background formula to calculate from each hour of each 24 hr. day...formulas with 95 nested COUNTIFs. So there are 1,729,000 COUNTIF calculations running in the background and spitting out results on a bar chart and two totals reports. And this does not include the formulas running behind each daily sheet to provide staffing numbers for each hour around the clock. It has been an absolute beast to manage. I just recently finished overhauling the whole thing, improving appearance and organization, locking areas of each sheet to minimize formula corruption, and implementing conditional formatting to further reduce the need for the user input (half a dozen supervisory personnel, some of whom can recognize a computer when they see one) that has caused so much grief. But in the end there was no getting around the enormous formulas running in the background. Excel is NOT the right tool for this job. But we have been unable to determine a replacement. LostButMakingGoodTime (talk) 17:52, 1 July 2018 (UTC)

Why is 'schism' probably referring to the East-West Schism? There have been dozens of schisms for all sorts of reasons, from Arianism and Miaphysitism to recent splits in the Mormon Church and various other young denominations. The comic could (theoretically) be referring to any one of these schisms, or (more likely, in my view) it could be referring to the concept of a schism in general. Personally I think it's most likely a parody of modern church splits over outwardly petty-looking, seemingly trivial, or apparently overly worldly reasons. 172.70.90.228 03:03, 20 September 2023 (UTC)

Agreed. Reference deleted. - MeZimm 162.158.175.129 15:05, 23 August 2024 (UTC)