Difference between revisions of "1495: Hard Reboot"

Explain xkcd: It's 'cause you're dumb.
Jump to: navigation, search
(Insert allusion to 974.)
Line 12: Line 12:
 
{{w|Paging|Swap space}} is a reserved area of a computer's hard drive reserved for use when the computer runs out of RAM.  Ideally, RAM + SWAP <= MAX, where MAX is the amount of memory the computer will ever try to use at the same time. However, some [broken] programs may keep requesting memory from the system until computer runs out of resources. Alternatively, system may be misconfigured to run more and more programs simultaneously. Rebooting the computer will empty the RAM and swap space so resources can be reallocated, but this only temporarily alleviates the underlying issue. Determining the root cause of the problem is often nontrivial.
 
{{w|Paging|Swap space}} is a reserved area of a computer's hard drive reserved for use when the computer runs out of RAM.  Ideally, RAM + SWAP <= MAX, where MAX is the amount of memory the computer will ever try to use at the same time. However, some [broken] programs may keep requesting memory from the system until computer runs out of resources. Alternatively, system may be misconfigured to run more and more programs simultaneously. Rebooting the computer will empty the RAM and swap space so resources can be reallocated, but this only temporarily alleviates the underlying issue. Determining the root cause of the problem is often nontrivial.
  
It would take up to 10 hours to figure out why the server is running out of swap space and fix the problem.  Alternatively, Randall could just take 5 minutes and plug the server into a light timer.
+
It would take up to 10 hours to figure out why the server is running out of swap space and fix the problem.  Alternatively, Randall could just take 5 minutes and plug the server into a light timer. This attitude to problem solving is in contrast to the attitude shown in [[974: The General Problem]].
  
 
Timers [http://www.diytrade.com/china/pd/10081499/Analog_Electric_Light_on_off_Timer_Dual_Outlet_Switch.html like the one in the comic] typically have four switches or notches per hour, so using the timer would replace an unpredictable and indefinite loss of service with a regular 15 minute downtime event once a day.  Also, it can be scheduled during, say, the middle of the night when most users are sleeping to minimize disruption.
 
Timers [http://www.diytrade.com/china/pd/10081499/Analog_Electric_Light_on_off_Timer_Dual_Outlet_Switch.html like the one in the comic] typically have four switches or notches per hour, so using the timer would replace an unpredictable and indefinite loss of service with a regular 15 minute downtime event once a day.  Also, it can be scheduled during, say, the middle of the night when most users are sleeping to minimize disruption.

Revision as of 09:18, 6 March 2015

Hard Reboot
Googling inevitably reveals that my problem is caused by a known bug triggered by doing [the exact combination of things I want to do]. I can fix it, or wait a few years until I don't want that combination of things anymore, using the kitchen timer until then.
Title text: Googling inevitably reveals that my problem is caused by a known bug triggered by doing [the exact combination of things I want to do]. I can fix it, or wait a few years until I don't want that combination of things anymore, using the kitchen timer until then.

Explanation

This comic is about using a simple and unrelated trick to fix a problem.

Swap space is a reserved area of a computer's hard drive reserved for use when the computer runs out of RAM. Ideally, RAM + SWAP <= MAX, where MAX is the amount of memory the computer will ever try to use at the same time. However, some [broken] programs may keep requesting memory from the system until computer runs out of resources. Alternatively, system may be misconfigured to run more and more programs simultaneously. Rebooting the computer will empty the RAM and swap space so resources can be reallocated, but this only temporarily alleviates the underlying issue. Determining the root cause of the problem is often nontrivial.

It would take up to 10 hours to figure out why the server is running out of swap space and fix the problem. Alternatively, Randall could just take 5 minutes and plug the server into a light timer. This attitude to problem solving is in contrast to the attitude shown in 974: The General Problem.

Timers like the one in the comic typically have four switches or notches per hour, so using the timer would replace an unpredictable and indefinite loss of service with a regular 15 minute downtime event once a day. Also, it can be scheduled during, say, the middle of the night when most users are sleeping to minimize disruption.

The correct method of scheduling a regular reboot would be using a cron task, but perhaps the server is "crashing" in such a dramatic manner that cron, or shutdown, or init stops working. The comic title alludes to this, in that a "hard" reboot scheduled with an analog timer is more guaranteed to work than a "soft" one scheduled with cron.

The title text's first sentence refers to situations where the given solution to a problem is just the original problem rephrased to sound like a solution. It may also refer to bug trackers, where someone found out and posted what causes the issue, but the bug is marked as "Unresolved," "Waiting," or "Will not fix."

The title text's second sentence is about the human tendency to wait on someone else to fix a problem rather than doing it yourself. Since everyone is waiting on someone else to do it, such problems tend to never be fixed, hence the solution of "wait[ing] a few years until I don't want that combination of things anymore."

Transcript

A section of a screen with a white-on-black color scheme is shown. The screen is covered in lines of illegible text.

Figuring out why my home server keeps running out of swap space and crashing: 1-10 hours

Next to the section of the screen is a timer plugged into a power port with cable running off to the side.

Plugging it into a light timer so it reboots every 24 hours: 5 minutes

Why everything I have is broken


comment.png add a comment! ⋅ comment.png add a topic (use sparingly)! ⋅ Icons-mini-action refresh blue.gif refresh comments!

Discussion

My interpretation is that the 1-10 hours is how long it would take to troubleshoot the problem and the 5 minutes is how long it would take to get kitchen timer and put into socket. So slides are showing the two solutions (one techy and liable to take up to 10 hours vs. the hacky but fast solution). ‎108.162.225.118 (talk) (please sign your comments with ~~~~)

At first I thought the ten hours was troubleshooting, but 5 minutes sounds about right for the granularity of the timer. Mikemk (talk) 06:51, 6 March 2015 (UTC)

Of course, the problem could be solved without a reboot simply by increasing the swap size., my understanding is that the SWAP is overflowing and not just 'too little'. So no, simply increasing the swap size wouldn't solve the problem. 173.245.53.214 07:36, 6 March 2015 (UTC)

I agree, and have removed that sentence, because there is no way to be sure that increasing the swap size will help. In fact increasing the swap size is the first step down the '1-10 hours to troubleshoot' path. --Pudder (talk) 08:52, 6 March 2015 (UTC)
I think it deserves mention. Mikemk (talk) 09:37, 6 March 2015 (UTC)

"Also, it can be scheduled during, say, the middle of the night when most users are sleeping to minimize disruption." That would be so annoying in my case. I'm glad Randall has a better discipline of schedule than me, with my Windows NT machine which these days definitely needs its manual weekly reboot and really needs to be functionally replaced except for all the additional fuss it'd require. (Also, I'm not sure about the "first sentence of the title text" bit, as currently stated, but doubtless it'll all be adjusted slightly.) 141.101.98.181 12:02, 6 March 2015 (UTC)

I would recommend 5:00 (am). It's nowhere near the middle of the night, but it's the time when it's most probable everyone is sleeping. Alternatively, considering it's just HIS router, he should know his sleeping patterns ... -- Hkmaly (talk) 12:11, 6 March 2015 (UTC)
When a reboot is least disruptive also depends on whether the machine is being used by users in other time zones. It really annonys me when I'm presented with "Server is down for scheduled maintenance", and the powers that be have decided that the best time to do that is in the middle of the day (for me). --RenniePet (talk) 12:42, 6 March 2015 (UTC)
Of course, if you tend to observe a 28-Hour Day it gets tricky to schedule (on a daily basis, at least). Yes, I used to do somthing like that, a couple of decades ago. (And my mind/body still wants to do it! Hence why even 5:00am would be awkward for me. More often than is convenient, anyway.) 141.101.98.188 11:27, 9 March 2015 (UTC)
My reaction to the solution (instead of using cron) was similar to when I see somebody emailing a photo by embedding it in a word document. I guess Randall did that on purpose! 141.101.98.195

Re: "Why everything I have is broken" - I think better explanation would be that by applying soem workarounds you can use broken things without actually fixing them. E.g. you can use server with memory leak without spending 10+ hours fixing the problem. Using this approach you can end up with a buch of broken things that are still useful. -- Jkotek (talk) (please sign your comments with ~~~~)

This was my understanding of the statement as well. 108.162.216.192 16:25, 6 March 2015 (UTC)

I think the "Why everything I have is broken" text refers to the fact that he has spent 10 hours troubleshooting the problem, then implements a hacky fix in 5 minutes which just makes the problem worse - hard rebooting a server every day is not likely to fix the problem and will probably make it worse, and the server will ultimately break. 141.101.99.87 14:37, 6 March 2015 (UTC)

"The title text's first sentence refers to situations where the given solution to a problem is just the original problem rephrased to sound like a solution." I don't think that's right... it makes it sound like the solution to the problem is to not have the problem, but the first sentence of the title text doesn't reference a solution at all. It's just noting that there's no point in the user looking around for other posts because this is exactly what he's getting, so if there's no solution for this problem then the problem can't be solved. 108.162.219.105 14:05, 6 March 2015 (UTC)

Thank you for the description! I was reading the 1-10 hours as the time it took for the system to crash, and the 5 minutes as the on-off time -- which obviously conflicted with the 24 hours text in the comic. This makes so much more sense now. =8o) Jarod997 (talk) 14:42, 6 March 2015 (UTC)

Should one of us ask Randall if he can tell us which bug this is (assuming it exists), or do the square brackets purposely ask that we should stifle our curiosity? Assuming it's an open-source project, this is an opportunity for readers to make a difference, rather than just humor (cf. "Randallism"). Chrstphrchvz (talk) 22:21, 6 March 2015 (UTC)

I don't think it's a specific bug. It's just humour. Mikemk (talk) 01:26, 7 March 2015 (UTC)
It might not only be humour. I can say we use the latter technique for a router. Mark Hurd (talk) 01:49, 7 March 2015 (UTC)
It's xkcd, is never just humor. :P
By way, more than hackish, this is just a plain sloppy duck-tape solution.(I mean, at least use a cron job!) 173.245.48.133 18:31, 7 March 2015 (UTC)


In the title text, it's THE kitchen timer, not A kitchen timer, so it may mean "the (light) timer from the kitchen"...188.114.110.47 12:03, 8 March 2015 (UTC)

The way the letters are squished together and its general untidiness make the cartoon look rather old. I wondered if it was one Randall dug out that he had drawn a while ago. 108.162.249.182 22:22, 8 March 2015 (UTC)

Why is there a description of 3-pronged plugs in the explanation? It seems to have no context. Djbrasier (talk) 13:38, 9 March 2015 (UTC)