Difference between revisions of "3062: Off By One"

Jump to: navigation, search
(Fixed use of "complimentary" for "complementary")
(Explanation: add tangible explanations of three error classes; also fixed typo "fhis")
Line 12: Line 12:
 
{{incomplete|The explanation is currently too hard to understand for non-technical readers.}}
 
{{incomplete|The explanation is currently too hard to understand for non-technical readers.}}
  
In computer programming and computer science, an {{w|off-by-one error}} is a very common human mistake left by an engineer who instructed the computer to process one too many or one too few items than are actually present. This can arise from a number of sources, including:
+
In computer programming and computer science, an {{w|off-by-one error}} is a very common human mistake made by an engineer who instructed the computer to process one too many or one too few items than are actually present. This can arise from a number of sources, including:
*Mistakenly using a ≤ (less than or equals) comparison where a < (less than) comparison was needed to terminate a {{w|Loop (programming)|loop}}, or vice versa (or, alternately, ≥ mixed up with >)
+
*Mistakenly using a ≤ (less than or equals) comparison where a < (less than) comparison was needed to terminate a {{w|Loop (programming)|loop}}, or vice versa (or, alternately, ≥ mixed up with >) (Imagine a literal interpretation of "when you get to the last task you are done" as opposed to "when you have completed the last task you are done".)
*Confusion between {{w|Zero-based numbering|zero-}} and one-based indexing of {{w|Array (data type)|arrays}} in code, either by convention or by definition in the code. Often, when numbering (indexing) elements in programming, counting starts from 0, so the initial element is not the "first" but actually the "zeroth". When using an {{w|Array (data structure)|array data structure}}, where elements are stored at equally spaced {{w|memory address}}es, zero-indexing lets you calculate the address of the <var>i</var>th element more easily: address(<var>i</var>) = address(<var>array</var>) + <var>spacing</var> &times; <var>i</var>. However, not all programming languages do this.
+
*Confusion between {{w|Zero-based numbering|zero-}} and one-based indexing of {{w|Array (data type)|arrays}} in code, either by convention or by definition in the code. Often, when numbering (indexing) elements in programming, counting starts from 0, so the initial element is not the "first" but actually the "zeroth". When using an {{w|Array (data structure)|array data structure}}, where elements are stored at equally spaced {{w|memory address}}es, zero-indexing lets you calculate the address of the <var>i</var>th element more easily: address(<var>i</var>) = address(<var>array</var>) + <var>spacing</var> &times; <var>i</var>. However, not all programming languages do this. (More tangibly, imagine a set of detailed instructions for washing dishes using a loop that starts with "put away any dish in your hands and pick up the next dish".  If you read the instructions with a dish already in your hand, that first dish will never get washed.)
*{{w|Fencepost error}}s (which some consider to be synonymous with off-by-one errors). These are often related to the norm, in programming language loop constructs, to require inclusivity on the lower side of the range but exclusivity on the upper side [<var>min</var> ≤ <var>i</var> < <var>max</var>). Consider the length of a range, such as the whole numbers from 4 to 6. If this is considered inclusive on one side and exclusive on the other, then the correct length is 2, to count the set {4, 5}. This is easily obtained via the subtraction <var>max</var> &minus; <var>min</var>. However, if the range is considered inclusive on both sides, as when placing fenceposts to hold a length of fence, then the correct length is 3, to count the set {4, 5, 6}. This is one more than the difference, and one more than the length usually used in software engineering, so the formula for this case is actually <var>max</var> &minus; <var>min</var> + 1.
+
*{{w|Fencepost error}}s (which some consider to be synonymous with off-by-one errors). These are often related to the norm, in programming language loop constructs, to require inclusivity on the lower side of the range but exclusivity on the upper side [<var>min</var> ≤ <var>i</var> < <var>max</var>). Consider the length of a range, such as the whole numbers from 4 to 6. If this is considered inclusive on one side and exclusive on the other, then the correct length is 2, to count the set {4, 5}. This is easily obtained via the subtraction <var>max</var> &minus; <var>min</var>. However, if the range is considered inclusive on both sides, as when placing fenceposts to hold a length of fence, then the correct length is 3, to count the set {4, 5, 6}. This is one more than the difference, and one more than the length usually used in software engineering, so the formula for this case is actually <var>max</var> &minus; <var>min</var> + 1. (A more concrete explanation: imagine that you are building a fence, with 1 meter panels between fenceposts.  To build a 10 meter fence you need 11 fenceposts, as you need an "extra" one at the end.  If you assume that a 10 meter fence needs 10 fenceposts, one per panel, you have a fencepost error -- the correct algorithm is that you need one fencepost plus one per panel.)
  
[[Cueball]] has attempted to combat off-by-one errors in his new programming language by introducing off-by-40-to-50 errors, which will indeed ensure that a simple reference to a value is never off by one (only a further, ''nearly'' complementary invocation of the ±[40..50] error will produce such a seemingly simple error). This severe change would introduce immediate failures in almost every program that does not have efficient error-catching and fault-tolerance. A programmer attempting to correct such failures would end up completely removing direct comparison with the end value of a range, the usual cause of off-by-one errors, but would have to account for the possibilities of all of their values being at least 50 off from the intended value in every elemental variable read/write (possibly multiple cases of fhis per statement, or line). For example, a normal loop can miss many elements (and possibly revisit others more than once, and/or out of order), not just possibly omit or over-extend one of the endpoints.
+
[[Cueball]] has attempted to combat off-by-one errors in his new programming language by introducing off-by-40-to-50 errors, which will indeed ensure that a simple reference to a value is never off by one (only a further, ''nearly'' complementary invocation of the ±[40..50] error will produce such a seemingly simple error). This severe change would introduce immediate failures in almost every program that does not have efficient error-catching and fault-tolerance. A programmer attempting to correct such failures would end up completely removing direct comparison with the end value of a range, the usual cause of off-by-one errors, but would have to account for the possibilities of all of their values being at least 50 off from the intended value in every elemental variable read/write (possibly multiple cases of this per statement, or line). For example, a normal loop can miss many elements (and possibly revisit others more than once, and/or out of order), not just possibly omit or over-extend one of the endpoints.
  
 
The title text states the obvious; by changing every number by 40 to 50, any number will be 40 to 50 off. This will compound further as the change can happen many times on a single line, as well as further by existing unforced off-by-one errors (or itself being left ill-defined or misunderstood, depending on whether "between" here is being understood as inclusive or exclusive) leading to potential off-by-39-to-51 errors.
 
The title text states the obvious; by changing every number by 40 to 50, any number will be 40 to 50 off. This will compound further as the change can happen many times on a single line, as well as further by existing unforced off-by-one errors (or itself being left ill-defined or misunderstood, depending on whether "between" here is being understood as inclusive or exclusive) leading to potential off-by-39-to-51 errors.

Revision as of 21:39, 25 March 2025

Off By One
It does come at the small cost of a LOT more off-by-40-or-50 errors.
Title text: It does come at the small cost of a LOT more off-by-40-or-50 errors.

Explanation

Ambox notice.png This explanation is incomplete:
The explanation is currently too hard to understand for non-technical readers. If you can fix this issue, edit the page!

In computer programming and computer science, an off-by-one error is a very common human mistake made by an engineer who instructed the computer to process one too many or one too few items than are actually present. This can arise from a number of sources, including:

  • Mistakenly using a ≤ (less than or equals) comparison where a < (less than) comparison was needed to terminate a loop, or vice versa (or, alternately, ≥ mixed up with >) (Imagine a literal interpretation of "when you get to the last task you are done" as opposed to "when you have completed the last task you are done".)
  • Confusion between zero- and one-based indexing of arrays in code, either by convention or by definition in the code. Often, when numbering (indexing) elements in programming, counting starts from 0, so the initial element is not the "first" but actually the "zeroth". When using an array data structure, where elements are stored at equally spaced memory addresses, zero-indexing lets you calculate the address of the ith element more easily: address(i) = address(array) + spacing × i. However, not all programming languages do this. (More tangibly, imagine a set of detailed instructions for washing dishes using a loop that starts with "put away any dish in your hands and pick up the next dish". If you read the instructions with a dish already in your hand, that first dish will never get washed.)
  • Fencepost errors (which some consider to be synonymous with off-by-one errors). These are often related to the norm, in programming language loop constructs, to require inclusivity on the lower side of the range but exclusivity on the upper side [mini < max). Consider the length of a range, such as the whole numbers from 4 to 6. If this is considered inclusive on one side and exclusive on the other, then the correct length is 2, to count the set {4, 5}. This is easily obtained via the subtraction maxmin. However, if the range is considered inclusive on both sides, as when placing fenceposts to hold a length of fence, then the correct length is 3, to count the set {4, 5, 6}. This is one more than the difference, and one more than the length usually used in software engineering, so the formula for this case is actually maxmin + 1. (A more concrete explanation: imagine that you are building a fence, with 1 meter panels between fenceposts. To build a 10 meter fence you need 11 fenceposts, as you need an "extra" one at the end. If you assume that a 10 meter fence needs 10 fenceposts, one per panel, you have a fencepost error -- the correct algorithm is that you need one fencepost plus one per panel.)

Cueball has attempted to combat off-by-one errors in his new programming language by introducing off-by-40-to-50 errors, which will indeed ensure that a simple reference to a value is never off by one (only a further, nearly complementary invocation of the ±[40..50] error will produce such a seemingly simple error). This severe change would introduce immediate failures in almost every program that does not have efficient error-catching and fault-tolerance. A programmer attempting to correct such failures would end up completely removing direct comparison with the end value of a range, the usual cause of off-by-one errors, but would have to account for the possibilities of all of their values being at least 50 off from the intended value in every elemental variable read/write (possibly multiple cases of this per statement, or line). For example, a normal loop can miss many elements (and possibly revisit others more than once, and/or out of order), not just possibly omit or over-extend one of the endpoints.

The title text states the obvious; by changing every number by 40 to 50, any number will be 40 to 50 off. This will compound further as the change can happen many times on a single line, as well as further by existing unforced off-by-one errors (or itself being left ill-defined or misunderstood, depending on whether "between" here is being understood as inclusive or exclusive) leading to potential off-by-39-to-51 errors.

The title text can be used to infer that the 40-to-50 range is inclusive on both ends, despite the word "between" also possibly implying an exclusive range, in different contexts.

Transcript

[Cueball faces Ponytail and Hairy while pointing behind him towards a laptop computer standing on a small desk.]
Cueball: Any time an integer is stored or read, its value is adjusted upward or downward by a random amount between 40 and 50.
[Caption below the panel:]
My new language almost completely eliminates off-by-one errors.

comment.png  Add comment      new topic.png  Create topic (use sparingly)     refresh discuss.png  Refresh 

Discussion

But what about floats? GreyFox (talk) 20:01, 12 March 2025 (UTC)

Is this dithering? Hcs (talk) 21:19, 12 March 2025 (UTC)

Could be. --PRR (talk) 22:19, 12 March 2025 (UTC)

This language has a huge off by one error: the docs don't explicitly say if the random range is inclusive. EDIT: the comic description above now includes this, thx --Snaxmcgee (talk) 22:22, 12 March 2025 (UTC)

But if it's adjusted both on store and on read, then there is a chance (of about 1 in 22) that the value after read will be exactly the same as the value before store. This does not eliminate pre-existing off-by-one errors, and in fact, introduces new ones if the adjustment on read is off by one from the adjustment on store, when there was no off-by-one error in the original code. And what's worse - with a single store-read cycle, the value can never be off by 40 to 50. It can be off by up to 10, or by between 80 to 100, in either direction. --NeatNit (talk) 22:42, 12 March 2025 (UTC)

I was just adjusting the explanation to imply this sort of thing (without having read your comment, just yet). Given the assumption that n=n±(40+rand(11)) at every stage (I'm assuming 'inclusive', Snaxmcgee!), two steps of 'intentional adjustment' might result in: -100 (x1), -99 (x2), -98 (x3), -97 (x4), -96 (x5), -95 (x6), -94 (x7), -93 (x8), -92 (x9), -91 (x10), -90 (x11), -89..-80 (x10..x1), -10 (x2), -9 (x4), -8 (x6), -7 (x8), -6 (x10), -5 (x12), -4 (x14), -3 (x16), -2 (x18), -1 (x20), ±0 (x22), +1..+10 (x20..x2), +80..+90..+100 (x1..x11..x1).
This gives a chance of being entirely correct as 22/484 (4.5454...%) and each off-by-one as very slightly less (though ±1, in total is almost twice as likely!).
Adding further steps (skipping odd step-cummulations, at least at first, until you get to nine of them and everything entirely stops being discontinuous) just spreads out an increased number of highs right next to zero deflection... 172.70.86.129 23:38, 12 March 2025 (UTC)
Obligatory quote:
There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.
See here for a full story of this quote: https://twitter.com/codinghorror/status/506010907021828096
--162.158.129.64 08:28, 13 March 2025 (UTC)
And 3 hard things in distributed computing: 3. Delivering messages exactly one time, 2. Making sure things happen in the correct order, and 3. Delivering messages exactly one time Jamcdonald (talk)
Presumably 1 is not losing data? --NeatNit (talk) 10:19, 13 March 2025 (UTC)
We may never know, Message 1 was never delivered Jamcdonald (talk)

In the comic, Cueball clearly says the adjustment amounts is ‘’between’’ 40 and 50, yet this explanation says the adjustment is from 40 to 50, ironically making an off-by-1 error on both ends of the range. Neither integers 40 nor 50 are “between 40 and 50”. 172.71.154.39 10:43, 13 March 2025 (UTC)

English language is imprecise with its use of "between", but it's usually taken as inclusive. Most people, when asked, "Pick a number between 1 and 10," will assume that 1 and 10 are both valid choices. Even in computing, you have things like Excel's RANDBETWEEN function to generate random integers between two bounds, which is inclusive. 104.23.187.72 (talk) 13:28, 13 March 2025 (please sign your comments with ~~~~)
Interestingly, in German such ranges are defined as including the borders, in Dutch they're defined as excluding the borders. (hence the Dutch t/m ("tot en met" - "up to, and including") 104.23.170.81 15:28, 13 March 2025 (UTC)
Yes, see between#Usage notes as one overview. Between as in "within the bounds defined by" is different from "amongst those things of which these items are the defining outer examples". Especially, but not exclusively, when that's just two distinct items which have no valid intermediate states betwixt the two to choose from ("you have to choose between me and my sister" isn't usually satisfactoraily answerable by choosing a different sibling, or perhaps parent, of the two). 172.69.195.54 15:24, 13 March 2025 (UTC)

It's easy to make an off-by-one error without using a computer at all. Ask a friend how many fenceposts are needed for a 100-foot fence if the rails are ten feet long. 172.71.30.199 12:58, 13 March 2025 (UTC)

And how wide are the posts..? ;) 141.101.98.6 15:27, 13 March 2025 (UTC)
If I ever answer this question 100, it's obviously because in my mind each post (or more accurately, the distance between the grooves cut into the post) is a foot wide, which is slightly larger than usual but not totally unreasonable size, and not because I fell for the trick.--Snaxmcgee (talk) 16:47, 13 March 2025 (UTC)

I don't understand - what's all this got to do with water balloons? 172.70.86.129 15:37, 13 March 2025 (UTC)

(ISWYDT...) 171.68.193.204 16:47, 14 February 2026 (UTC)
What did they do. Waterballoons were the previous comic? What did they do? 172.70.230.159 22:41, 13 March 2025 (UTC)
They were off by one [day] 172.68.55.113 16:35, 15 March 2025 (UTC)

Let's make programming languages do this! There should be an implementations section! :D :D 172.70.230.159 22:41, 13 March 2025 (UTC)

Nearly added the following information to the Explanation, but it's probably too specific an example:

The language Perl can be asked (with an array @array) for either the number of elements (with something like int(@array)) or the index of the final element (typically by $#array) of lists that usually start at the index of zero. The method of looping through the array should then be carefully matched to the limit given (shift/pop it as many times as there are elements, as one of the various "do-while" or "while-do" types of loop available, or else run from index zero to the topmost index value that normallybought to be 'elements minus one', with one of the more "for(;;)"-style counting-loops).

And, of course, there are subtle differences between Perl versions, not nust limited to whetger it's $array[$index] or @array[$index], and if $[ or 'arybase' can be used to mix things up a bit. (Though I might just map {...} @array it, or do something potentially awful but valid like {@array||last;print$".splice@array,rand@array,1;redo} if it seemed like I should.) 172.70.163.167 00:33, 14 March 2025 (UTC)
      comment.png  Add comment