Editing 2048: Curve-Fitting

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 21: Line 21:
 
<math>f(x) = mx + b</math>
 
<math>f(x) = mx + b</math>
  
{{w|Linear regression}} is the most basic form of regression; it tries to find the straight line that best approximates the data. As it's the simplest, most widely taught form of regression, and in general differentiable functions are locally well approximated by a straight line, it's usually the first and most trivial attempt of fit.
+
{{w|Linear regression}} is the most basic form of regression; it tries to find the straight line that best approximates the data. As it's the simplest, most widely taught form of regression, and in general derivable functions are locally well approximated by a straight line, it's usually the first and most trivial attempt of fit.
  
The picture to the right shows how totally different data sets can result in the same line. It's obvious that some more basics about the nature of the data must be used to understand if this simple line really does make sense.
+
The picture to the right shows how totally different data sets can result into the same line. It's obvious that some more basics about the nature of the data must be used to understand if this simple line really does make sense.
  
 
The comment below the graph ''"Hey, I did a regression."'' refers to the fact that this is just the easiest way of fitting data into a curve.
 
The comment below the graph ''"Hey, I did a regression."'' refers to the fact that this is just the easiest way of fitting data into a curve.
Line 32: Line 32:
 
{{w|Polynomial regression|Quadratic fit}} (i.e. fitting a parabola through the data) is the lowest grade polynomial that can be used to fit data through a curved line; if the data exhibits clearly "curved" behavior (or if the experimenter feels that its growth should be more than linear), a parabola is often the first, easiest, stab at fitting the data.
 
{{w|Polynomial regression|Quadratic fit}} (i.e. fitting a parabola through the data) is the lowest grade polynomial that can be used to fit data through a curved line; if the data exhibits clearly "curved" behavior (or if the experimenter feels that its growth should be more than linear), a parabola is often the first, easiest, stab at fitting the data.
  
The comment below the graph ''"I wanted a curved line, so I made one with math."'' suggests that a quadratic regression is used when straight lines no longer satisfy the researcher, but they still want to use simple math expression. Quadratic correlations like this are mathematically valid and one of the simplest kind of curve in math, but this curve doesn't appear to satisfy the data any better than does simple, linear regression.
+
The comment below the graph ''"I wanted a curved line, so I made one with math."'' suggests that a quadratic regression is used when straight lines no longer satisfy the researcher, but he still wants to use simple math expression. Quadratic correlations like this are mathematically valid and one of the simplest kind of curve in math, but this curve doesn't appear to satisfy the data any better than does simple, linear regression.
  
 
===Logarithmic===
 
===Logarithmic===
Line 46: Line 46:
 
<math>f(x) = a\cdot b^x</math>
 
<math>f(x) = a\cdot b^x</math>
  
An {{w|Exponential growth|exponential curve}}, on the contrary, is typical of a phenomenon whose growth gets rapidly faster and faster - a common case is a process that generates stuff that contributes to the process itself; think bacteria growth or compound interest.
+
An {{w|Exponential growth|exponential curve}}, on the contrary, is typical of a phenomenon whose growth gets rapidly faster and faster - a common case is a process that generates stuff that contributes to the process itself, think bacteria growth or compound interest.
  
 
The logarithmic and exponential interpretations could very easily be fudged or engineered by a researcher with an agenda (such as by taking a misleading subset or even outright lying about the regression), which the comic mocks by juxtaposing them side-by-side on the same set of data.
 
The logarithmic and exponential interpretations could very easily be fudged or engineered by a researcher with an agenda (such as by taking a misleading subset or even outright lying about the regression), which the comic mocks by juxtaposing them side-by-side on the same set of data.
Line 53: Line 53:
  
 
===LOESS===
 
===LOESS===
A {{w|Local regression|LOESS fit}} doesn't use a single formula to fit all the data, but approximates data points locally using different polynomials for each "zone" (weighting data points differently as they get further from it) and patching them together. As it has many more degrees of freedom compared to a single polynomial, it generally "fits better" to any data set, although it is generally impossible to derive any strong, "clean" mathematical correlation from it - it is just a nice smooth line that approximates the data points well, with a good degree of rejection from outliers.
+
A {{w|Local regression|LOESS fit}} doesn't use a single formula to fit all the data, but approximates data points locally using different polynomials for each "zone" (weighting differently data points as they get further from it) and patching them together. As it has much more degrees of freedom compared to a single polynomial, it generally "fits better" to any data set, although it is generally impossible to derive any strong, "clean" mathematical correlation from it - it is just a nice smooth line that approximates well the data points, with a good degree of rejection from outliers.
  
The comment below the graph ''"I'm sophisticated, not like those bumbling polynomial people."'' emphasises this more complicated interpretation, but without a simple mathematical description it's not very helpful to find informative interpretations of the underlying data.
+
The comment below the graph ''"I'm sophisticated, not like those bumbling polynomial people."'' emphasises this more complicated interpretation but without a simple mathematical description it's not very helpful to find academic descriptions on the underlying matter.
  
 
===Linear, No Slope===
 
===Linear, No Slope===

Please note that all contributions to explain xkcd may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see explain xkcd:Copyrights for details). Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel | Editing help (opens in new window)