Editing 1838: Machine Learning

{{comic
| number    = 1838
| date      = May 17, 2017
| title     = Machine Learning
| image     = machine_learning.png
| titletext = The pile gets soaked with data and starts to get mushy over time, so it's technically recurrent.
}}

==Explanation==
{{incomplete|Work in progress. <s>This explanation is an attempt at {{w|design by committee|machine learning by committee}}.</s>}}

This comic compares a machine learning system to a compost pile.

{{w|Machine learning}} is a method employed in automation of complex tasks. It usually involves creation of algorithms that deal with statistical analysis of data and pattern recognition to generate output. The validity/accuracy of the output can be used to give feedback to make changes to the system, usually making future results statistically better.

{{w|Composting}} is the process of taking organic matter, such as food and yard waste, and allowing it to decompose into a form that serves as fertilizer. A common method of composting is to mound the organic matter in a pile with a certain amount of moisture, then "stirring" the pile occasionally to move the less-decomposed material from the top to the interior of the pile, where it will decompose faster. 

In this comic, Cueball explains to a Cueball-like guy his machine learning system, which consists of a pile of mathematical functions with an input funnel (labelled "data") at one end and an output box (labelled "answers) at the other. Cueball himself appears to be a functional part of this system as he stands atop the pile stirring it with a paddle.

In this cartoon data is input into a funnel, and goes through a mess of linear algebra, and comes out as answers. The main joke is that, despite this description being too vague and giving no intuition or details into the system, it is close to the level of understanding most machine learning experts have of the most popular class of techniques in machine learning, namely deep learning with neural networks. <!--''(Why reference to neural networks here? They are non-linear. A better example is support vector machines.)''-->

''One of the most popular paradigms of machine learning is that of supervised learning, where a function mapping an input to an output is learned from several input-output pairs, e.g. a function mapping images of faces to people names, from a dataset of static labelled images. Classic machine learning techniques like regression, or logistic regression, have understandable parameters, and provable algorithms, but require significant engineering in the pre-processing step and don't perform very well for data like images or natural text. Deep learning techniques, on the other hand, require very little pre-processing, but require the data to be run through several steps of linear algebra, where essentially in each step the output of the previous step is multiplied with a matrix and sent to the the next step. This multi-step process has proven to be very successful for image and text data, but the structure of the parameters, arranged as a matrix for each step, allows for very little interpretation, and can only be described as "data going through a pile of linear algebra".''

The method of training such deep neural networks is via gradient descent, which can be viewed as "stirring the pile of linear algebra until the answers start looking right".

The title text refers to recurrent neural networks, which are a useful class of deep neural networks for dealing with sequence data like speech or text.

This comic satirizes machine learning, more specifically neural networks. In its most basic form, a neural network takes data and results and strengthens connections that give the right answer and weakens ones that don't, until the results "look right". Neural networks are extremely data-dependent, and make remarkably few guarantees when compared to most other computing techniques, thus the joke.

Cueball's machine learning system is probably very inefficient, as he is integral to both the mechanical part (repeated stirring) and the learning part (making the answers look "right"). 

''Recently, other forms of neural networks, such as LSTMs, feed old sequence data back into the network with some delay, making it recurrent. The title text calls this the pile "getting mushy". The title text is also be a pun based on how Cueball is going through the data. Instead of using a shovel, he is using a canoe paddle. Canoes can be used on rivers, and rivers by definition have currents. Thus, a recurrent data could, in this situation, mean data treated as if it were part of a river.''

In large-scale composting operations, the raw organic matter added to the pile is referred to as "input". This cartoon implies a play on the term "input", comparing a compost input to a data input.

==Transcript==

[Cueball, holding a canoe paddle at his side, is standing on top of a "big pile of linear algebra" containing a funnel labeled "data" and box labeled "answers" while talking to a Cueball-like person to the left (from the reader's perspective)]

Guy: <i>This</i> is your machine learning system?

Cueball: Yup! You pour the data into this big pile of linear algebra, then collect the answers on the other side.

Guy: What if the answers are wrong?

Cueball: Just stir the pile until they start looking right. 

{{comic discussion}}