Difference between revisions of "2483: Linked List Interview Problem"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Linked lists are used in low level programming, but so are Vectors, the primary alternative. Linked list problems are typically seen to be archaic due to the existence of stable linked list libraries.)
(Explanation mallocs are slooow.)
Line 14: Line 14:
 
A linked list is a way to store sequential data in computer memory. Each piece of data is stored with a pointer to the next piece. This makes it very easy to add new data in the middle, since only one existing pointer must change to point to the new data. The drawback of a naive implementation can be that finding data may require following the entire chain. Technical programming interviewers like to see if applicants are familiar with the structure and the computational complexity concept itself.
 
A linked list is a way to store sequential data in computer memory. Each piece of data is stored with a pointer to the next piece. This makes it very easy to add new data in the middle, since only one existing pointer must change to point to the new data. The drawback of a naive implementation can be that finding data may require following the entire chain. Technical programming interviewers like to see if applicants are familiar with the structure and the computational complexity concept itself.
  
Linked lists are, historically, one of the two main data structures that represent sequential data, along with arrays. Unlike arrays, they have the theoretical advantage of O(1) insertions and deletions thanks to not needing to reallocate the entire structure, but have O(n) random access (see {{w|Linked_list#Linked_lists_vs._dynamic_arrays|comparisons}}). However, modern processors' cache structure favors data that are located next to each other, pre-fetching the adjacent items, and modern processors can perform bulk memory moves, making resize operations faster. Finally, using linked lists usually implies dynamic allocation of each list member, which adds overhead, particularly for smaller data items; many small allocations also tend to fragment memory, which can lead to it being wasted and unavailable to the app later, particularly in long-running processes such as web servers. These properties tend to make linked lists poorly suited for most system programming applications in which a programmer might write algorithms to manipulate data structures, instead of using existing libraries.
+
Linked lists are, historically, one of the two main data structures that represent sequential data, along with arrays. Unlike arrays, they have the theoretical advantage of O(1) insertions and deletions thanks to not needing to reallocate the entire structure, but have O(n) random access (see {{w|Linked_list#Linked_lists_vs._dynamic_arrays|comparisons}}). However, modern processors' cache structure favors data that are located next to each other, pre-fetching the adjacent items, and modern processors can perform bulk memory moves, making resize operations faster. Finally, using linked lists usually implies dynamic allocation of each list member as opposed to reserving memory for a bunch of items in a bulk and then using that memory once an item has to be added. Memory allocation tends to be slow on modern system and adds overhead for managing the information, which byte is allocated for what item, which can be significant, particularly for smaller data items; many small allocations also tend to fragment memory, which can lead to it being wasted and unavailable to the app later, particularly in long-running processes such as web servers. These properties tend to make linked lists poorly suited for most system programming applications in which a programmer might write algorithms to manipulate data structures, instead of using existing libraries.
  
 
Modern programming languages usually provide abstractions (often named "array," "vector" or "list") which interact with the sequential data at the memory level, providing access to this data while using arrays, linked lists, hybrids of the aforementioned technologies, or other approaches, and the programmer doesn't necessarily need to care one way or another. Knowing the underlying concepts is still useful, however, when creating fast running code which scales well to large data, avoiding (e.g.) traversing the list over and over again, or performing particularly inefficient operations.
 
Modern programming languages usually provide abstractions (often named "array," "vector" or "list") which interact with the sequential data at the memory level, providing access to this data while using arrays, linked lists, hybrids of the aforementioned technologies, or other approaches, and the programmer doesn't necessarily need to care one way or another. Knowing the underlying concepts is still useful, however, when creating fast running code which scales well to large data, avoiding (e.g.) traversing the list over and over again, or performing particularly inefficient operations.

Revision as of 07:41, 2 July 2021

Linked List Interview Problem
I'd traverse it myself, but it's singly linked, so I'm worried that I won't be able to find my way back to 2021.
Title text: I'd traverse it myself, but it's singly linked, so I'm worried that I won't be able to find my way back to 2021.

Explanation

Ambox notice.png This explanation may be incomplete or incorrect: Created by a LINKED LIST. Please mention here why this explanation isn't complete. Do NOT delete this tag too soon.
If you can address this issue, please edit the page! Thanks.

In computer programming, a Linked list is a type of data structure that stores data throughout memory accompanied with memory addresses of the next, and potentially previous data point, establishing a relative ordering for a collection of data. Several common software engineering interview questions involve manipulating or otherwise interacting with linked lists. Possibly because programmers in the current day rarely work with linked lists directly, Randall suggests that such structures belong in a "technology museum," and thinks it would be more beneficial to mankind to email the list to such a museum rather than perform any useful work with it.

A linked list is a way to store sequential data in computer memory. Each piece of data is stored with a pointer to the next piece. This makes it very easy to add new data in the middle, since only one existing pointer must change to point to the new data. The drawback of a naive implementation can be that finding data may require following the entire chain. Technical programming interviewers like to see if applicants are familiar with the structure and the computational complexity concept itself.

Linked lists are, historically, one of the two main data structures that represent sequential data, along with arrays. Unlike arrays, they have the theoretical advantage of O(1) insertions and deletions thanks to not needing to reallocate the entire structure, but have O(n) random access (see comparisons). However, modern processors' cache structure favors data that are located next to each other, pre-fetching the adjacent items, and modern processors can perform bulk memory moves, making resize operations faster. Finally, using linked lists usually implies dynamic allocation of each list member as opposed to reserving memory for a bunch of items in a bulk and then using that memory once an item has to be added. Memory allocation tends to be slow on modern system and adds overhead for managing the information, which byte is allocated for what item, which can be significant, particularly for smaller data items; many small allocations also tend to fragment memory, which can lead to it being wasted and unavailable to the app later, particularly in long-running processes such as web servers. These properties tend to make linked lists poorly suited for most system programming applications in which a programmer might write algorithms to manipulate data structures, instead of using existing libraries.

Modern programming languages usually provide abstractions (often named "array," "vector" or "list") which interact with the sequential data at the memory level, providing access to this data while using arrays, linked lists, hybrids of the aforementioned technologies, or other approaches, and the programmer doesn't necessarily need to care one way or another. Knowing the underlying concepts is still useful, however, when creating fast running code which scales well to large data, avoiding (e.g.) traversing the list over and over again, or performing particularly inefficient operations.

In the title text, a singly linked list contains pointers to traverse the list in only one direction; namely, from the head to the end. By contrast, each element in a doubly linked list contains pointers to both the "next" and "previous" elements, enabling traversal in either direction. Randall continues the implication that such lists are obsolete by implying that traversing such a list would be akin to time travel to the past. Without the "previous element" pointers, Randall is concerned he would not be able to reverse the time travel, as he could not traverse the list in the reverse direction.

Transcript

Ambox notice.png This transcript is incomplete. Please help editing it! Thanks.

[Cueball is seen writing on a whiteboard, Ponytail is standing next to him. Above it, a piece of code is written, which apparently is what Cueball is writing on the whiteboard. The text reads:]

   define traverseLinkedList(headPointer):
      myId="<illegible scribbling probably containing a user ID>"
      authToken="<illegible scribbling containing an auth token>"
      museumAddress="<illegible address>@<illegible domain>.<illegible tld>"
      client=mailRestClient(myID, authToken)
      client.messages.send(to=museumAddress,
      subj="Item donation?", body="Thought you
      might be interested: "+str(headPointer))
      return

Ponytail: Hey.

[Caption beneath the panel:] Coding interview tip: Interviewers get really mad when you try to donate their linked lists to a technology museum.


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

Assuming not everyone understands O notation: O(1) means that it always takes the same time, no matter how much data is stored. O(n) means the time is proportional to the amount of data stored - if you have 10 times the data, it takes 10 times as long to find the one you want. 108.162.221.84 (talk) (please sign your comments with ~~~~)

This code won't mail the linked list to a museum - it will mail the memory location of the head of the list to a museum. 172.70.130.192 (talk) (please sign your comments with ~~~~)

I think part of the joke might be that the high-level language being used will actually spit out a representation of the entire list when using the str function. So it actually does all the traversing and abstracts it away, again making the interview question seem redundant! 162.158.159.48 10:40, 1 July 2021 (UTC)
The language looks almost like Python -- the only difference being the keyword define instead of def. Lisp is the only family of languages I can think of that automatically converts linked lists to a representation of all the elements, since the linked list is its fundamental data structure. Barmar (talk) 14:06, 1 July 2021 (UTC)
Haskell too: `headElem:tailList` is cons, https://wiki.haskell.org/How_to_work_on_lists#Notes_about_speed says "Haskell lists are ordinary single-linked lists." Solomon (talk) 01:34, 1 December 2022 (UTC)

just to make sure I get this right. If I want to save the numbers "1", "2", "3", "4" in an array it could (depending on the programming language) just be "[1,2,3,4]", while a linked list could be "1 (jump to 3rd entry), 4, 2 (jump to 4th entry), 3 (jump to 2nd entry)"? Then entering 2.5 between 2 and 3 would be complicated in the array as you have to move the 3 and 4 to new places, while in the linked list you just change the direction after to to jump to 5th entry, and add 2.5 and the instruction to jump to 4th entry? While it is of course harder to find a specific entry in the linked list. --Lupo (talk) 06:01, 1 July 2021 (UTC)

At the lowest level of access, such an array would be like the sequence "1234" (analogising to a simple string/char-array), asking for the nth-element quickly gets the nth-character by offset plus suitably multiplied memory reference). Inserting ("12a34") or deleting ("124") needs at least partial shuffling and resizing, while switching ("1324") or other internal re-ordering has widely variable overheads.
A linked-list could be thought of as defining as "¹" with ¹="1²", ²="2³", ³="3⁴" and ⁴="4∅", taking up more initial memory, and effort to discover the nth item. But, done right and for the right reasons, additions (²="2⁵", ⁵="a³"), removals (²="2⁴", dump/reuse ³) and switches (either ²="3³", ³="2⁴" or ¹="1³", ³="3²", ²="2⁴") can be as efficient as possible once the splice-and-switch process knows which points to work with.
(A linked-list sorter/editor will probably traverse the list, not worrying what 'offset' it is at, but holding an ⁿ pointer address for at least two adjacent items, ready to alter their ⁿs-as-reference to fulfil the change required, without worrying which ⁿs they were, and when created in whatever the next memory slot is.)
Doubly-linked might be list header "¹" where ¹="∅1²", ²="¹2³", ³="²3⁴" and ⁴="³4∅" and is heavier in storage (though often balanced by the "1234" being much more complex as actual data (e.g. multi-word, possibly variable-length records) than the simple ⁿs, that in an array-accessed form would include far too much padding and wasting storage (or too little, requiring optionally-defined ⁿs at the end of each fixed-length record to direct to an 'overflow' memory location, effectively LLing) thus justifying the potential LL packing overheads.
For further hybrid fun, nothing stops you having a fixed array "¹²³⁴∅∅∅" and define ¹="1", etc, then change the array-of-references accordingly ("¹²⁵³⁴∅∅", "¹²⁴∅∅∅∅", "¹³²⁴∅∅∅" or - if it's sensible - "¹²³⁴³²¹" which actually does something the LL would be hard-pressed to achieve for you without further structural overheads specifically designed for beyond-linear traversal).
That it potentially becomes spaghetti-data should not concern you so long as you don't have spaghetti-code as well which causes some oversight of data-mangling to mess things up. And you'll probably want to maintain a custom data-dumper/collator/formatter capability to keep an eye on things as you're debugging the inevitably miswritten shuffle-function, and/or do battle with the compiler's garbage-handling insertions when you confuse it beyond reasonable limits. (No, wait, did you do full low-level garbage-handling yourself? Did you do it properly? ;) )
...but I must say I'm not overly keen to abandon modern inbuilt splice-functions (for arrays/otherwise) doing all this hard work for me. Only if I'm looking at something of more of a net-/tree-like relationship (esp. non-Euclidean), or something with complicated multi-layered disparity of pointed-at data might I design up from such basic foundations. But I can also be nostalgic about when it was far more necessary! 162.158.159.48 10:18, 1 July 2021 (UTC)
I'm sorry, but I found this *really* hard to understand, despite already knowing what linked lists are and how they work. Beanie talk 13:20, 3 April 2023 (UTC)
Being the one who wrote that, I can see what I was explaining but I'm not right now sure why I did... ;) So, for the latter, I do apologise. As for it being complicated, well... Linked Lists/etc are often somewhat complicated to implement/document, so I can't take any blame for that particular aspect of the universe. :P 14:01, 3 April 2023 (UTC)

Does anyone know when the last comic was that used colors? Is this something worth mentioning? --162.158.88.42 06:11, 1 July 2021 (UTC)

I found the category: Category:Comics with color. --162.158.93.153 06:17, 1 July 2021 (UTC)

I added some words regarding the title text. Feel free to expand/clarify/correct as necessary. 172.69.35.209 06:57, 1 July 2021 (UTC)

The comic could also be a reference to the British Museum Algorithm. --162.158.88.110 09:09, 1 July 2021 (UTC)

I second a previous comment, the code *does not* send the list to the museum, only the string representation of the head pointer. So the examiner may be rightully pissed off because both can be true: the candidate is trying to make fun of list algorithms and he doesn't know how to deal with a list. (Unsure of what follows: given that the code looks like python, this may also be sarcasm about the style of (not only) python programming that always resorts to some external code module instead of defining new data structures and coding related methods. In this case, the external module is a museum :-) ). Xkcdmax (talk)

Those wondering why linked lists are considered obsolete: insertion and deletion performance is rarely the issue these days. It's the cost of enumerating over all elements in the list. Both arrays and linked lists have O(n) complexity there, but arrays have the lower cost. And that's before we get into stuff like caches liking predictable access patterns (pointer chasing is not predictable) and all those pointers costing precious cache memory space.--Henke37 (talk) 09:45, 1 July 2021 (UTC)

If the elements are simpler and relatively constant in individual storage demands (regardless of total numbers to store), arrays and bulk-caching work well. If they're more convoluted records (e.g. up to 64 characters as element name, 256 characters for a description, version 'number' that's another string, a notes field that is a pointer to an arbitrary chain of formatted/markupped punctuated character-storing freetext variable slots, any number of other object properties you find useful) then most of the advantages of indexable layout for lookahead loading are lost. If you're writing at significantly low-level of code, already, then you could still possibly see an advantage to implementing linked-list structures and not lose out enough to the advantages you'd get for an array implementation.
Though these days you're not encouraged to tunnel past the abstractions the higher-level compiler/interpreter will present to you. You could be hard pressed to do anything efficient yourself (like an array-of-pointers approach, or using XOR packing to cut down on memory requirements in a doubly-linked list) and must blindly trust that the original authors of the intermediate builder gave it the wisdom to not be too bad trying to match what you input to a suitably workable pre-anticipated family of data-series methodologies by the time it gets to runtime.
And there's so much power in a modern computer core that, even with a resource-hogging OS, you're probably not going to break it by manually forcing the worst option, unless you're already in danger of stressing the system even with the truly best one. 141.101.99.93 23:44, 1 July 2021 (UTC)

Anyone else think the chosen color might be relevant? We're talking about **link**ed lists and the text is written in blue, the traditional color of hyper**link**s. In any other comic, I might think it a coincidence, but this is a comic that rarely uses color, and never without a purpose. Trlkly (talk) 07:15, 3 July 2021 (UTC)

Blue whiteboard pens are probably the more used 'not black' (because easier on the eye?) but not specifically hued (red for important/'do not do' information, green for softer suggestions or else with comparative 'do do' positive stuff). From personal experience. Not sure if this is relevent, maybe it's just that blue-on-white is what Randall overwhelmingly experiences when he casually wanders in to NASA, JPL, Cern, NIF, Alphabet Inc, Apple Park, Redmond Campus, etc, and looks for casual inspiration on their various walls. 141.101.98.206 18:33, 3 July 2021 (UTC)

I think you're all missing the point of the joke: it's not the linked list itself but the interview question about linked lists that should be donated to the museum. A typical interview question is "how do you reverse a linked list?", with the interviewer expecting you to write down the algorithm where you walk down the list while creating a new linked list in the process, wiring up its "next" pointer to the previously visited element. For the first element you traverse, you set the "next" pointer of that element in the reversed list to nil, because it will be the last element in the reversed list. The final result is a pointer to the last visited element, which becomes the head of the reversed list. These kind of questions are stereotypical for programmer interviews (just like "how do you swap to numbers without using a temporary variable?") and therefore Cueball makes a snarky remark that this question is now so archaic that it should be in a historical museum of sorts.162.158.88.88 14:22, 5 July 2021 (UTC)

The text below the comment ("... donate their linked list ...") suggests the reading others have taken...