Note: This is a guide for people without a math and code background, written by a medico for other people from a medical/biology background. If you are the mathy type, this is not for you, try this instead
Introduction: What is this AI business?
Artificial Intelligence (AI) is an umbrella term for scientific fields whose aim is to mimic or replicate human-like skills with computers. A large part of this field is driven by computer science and mathematics.
One of AI computing’s main goals is to create self-learning or self-training systems or algorithms. The field of AI computing that tries to create, test and use such algorithms is called Machine Learning (ML). Artificial Neural Networks (ANNs) are one such algorithm which are very popular and have lead to a lot of breakthroughs in what computers can do.
Artificial Neural Networks (ANNs) are a type of software algorithm that is composed of bits of code that can do math and store information (neurons), that pass information (inputs) back and forth between each other, making slight changes till a particular result (output) is achieved.
This article describes a largely non-mathematical and no-code explanation for how it does this.
What are ANNs
Consider the Retina, it has layers and when light hits these layers, it triggers (gets converted into) various types of signals (chemical and electrical) and all these signals gets passed on to the Optic nerve and then to the Occipital lobe, which working in concert with the rest of the brain interprets what the signal means and produces an output- vision.[1].
This is not how an ANN works.
Like with the retina though, ANNs are code (neurons) arranged in layers sandwiched between an Input layer, which receives the signal, and an output layer which produces the results, and finally a verifier which determines if the right output was received or not.
Why are they called neurons and artificial neural nets (ANNs)?
There’s a theory that neurons learn by passing on information that meets a certain criteria (activation threshold) to other neurons (spreading of activation) and getting repeated feedback about how to correct these activations from those neurons till it activates for the right signal or passes on the right information. [2].
For example, when you learn to do a physical task, like opening the fridge, the very first time you do it, your muscles and the nerves that control the muscles don’t really know how much force to use, or how much signal should pass between which neurons controlling which muscles to open the door smoothly. But, the system picks a starting amount of power, and tries, and the sense organs give feedback that says is this working or not, and based on that, it will adjust the amount of power needed. Over time the flow of information and signals between all the nerves and the muscles and the various organs involved in opening this door is so good and so optimized that it becomes effortless.
This is thought to be because of the back and forth “this is working, this is not working, too much pressure, too little pressure, wrong angle, right angle” messages that are rapidly passing between the hand and the brain. And it is this self correcting back and forth messaging system (back propagation of error) which eventually figures out the right way to solve the problem by finding the right amount and type of information (signal) passed between the neurons involved in this action.
This is called the connectionist theory of cognition, and this system of learning is called a connectionist system because the way it works is thought to be due to the strength (weight) of its connections and finding the right connections.
Most neuroscientists believe this theory to be completely wrong about how the brain actually learns [3]
But, it’s either got a kernel of truth in it, or due to some mysterious reasons, this system of learning works when turned into computer code and so is called an artificial neural network.
So what does each neuron in these ANNs do?
Two things
- When it receives a signal it transforms it and sends it forward to other neurons it is connected to. (Forward pass)
- When it receives some information back from neurons about what was wrong with the earlier output, it adjusts the output and then passes the information along to the neurons behind it. (Backward pass)
Think of it like a complicated game of Chinese whispers.
You’ve got people standing in a row, and the game host says “life is meaningless” to the first person. Person one hears “wife is meaningless” and blurts it out to the next person, who hears “wife is weaning less”, the next person hears “life is winning less”, and so it goes. Now, in this version of the game, instead of revealing the answer, the verifier who is standing right at the end says, you were wrong by x percentage. So the last person passes this to the person before them, and this process is repeated a very large number of times till the message being received back is “life is meaningless”. Note that the last person isn’t told that the right answer is “life is meaningless”, only that it was wrong by a small amount.
Now consider that this is being done in parallel with multiple rows and each rows hears a different parts of the message, and what the verifier wants to hear is ” life is meaningless, cried the nihilist, while munching on stale peanuts he stole from the existentialist’s larder”
Sounds like a nightmare, and a mathematical nightmare it is for people like me who start getting hypoglycemic when letters become numbers and numbers get letters added to them, sometimes before, sometimes after, sometimes on top etc. Fortunately, the programs do all the math themselves and we just need to know two things. What matrix multiplication is what a linear equation is.
Matrix multiplication is a method of multiplying two sets of numbers arranged in a particular way. A matrix looks kinda like this
$\begin{bmatrix} 1 & 2\\ 2 & 4 \\ 5 & 6\ \end{bmatrix}$
So it’s like a table of numbers but they encode some other information like which axis a number is on and stuff. There are specific rules about how two or more matrices can be multiplied. We won’t go into that because it is boring.
The reason we want to know about this is that in machine learning one thing we do a lot is turn all our data into matrices and then multiply them.
We represent numerical data as vectors and represent a table of such data as a matrix. The study of vectors and matrices is called linear algebra Source: Math for ML
Irrespective of what kind of data you need to provide as input, all of it gets converted into numbers. But you already know that that’s how computers work. We just use a specific method for representing these when doing ML, which is turning everything into vectors or matrices.
Linear equations (which is what linear algebra uses) all kinda look like $ax+b =c$
This is a mathematical formula which will always produce a straight line if the values are changed and plotted.
This means that, if you know the $a$ and the $b$ and the $c$, you can guess what the $x$ is and solve the equation [no duh].
But even if you don’t know what $c$ is, but can find out if the number you came up with (your $c$) is bigger than or smaller than $c$, you can adjust the $x$ to get the right answer.
The reason why we need this kind of a mental contortion is because remember that in our Chinese whispers example, the person at the verification end doesn’t say what the answer is, only that the given answer is right or wrong and some directional information. This behaviors is not just to create confusion, it’s designed to facilitate learning, because what we want is not a network that has memorized the answers (which it would if you give it the answer), but a network that’s learned the method to arrive at the answer (which you torture it into learning) .
So if you have a formula like $50x + 70 = 520$ and you don’t know the answer is $520$, but have this oracle who can tell you if the number you come up with is bigger or smaller than the real answer, how would you solve this?
You could begin by guessing $x=1$, and get the answer $c_1= 120$, ask the oracle is $c_1 >= c$ and you will hear $FALSE$.
Which means your $x$ needs to be bigger. OK, how about $x=10?$ This will give us $570$.
Oracle, is $c_{10} >= c ? -> TRUE$
Now we know we (probably) need to make the $x$ smaller to get the right answer.
As you can imagine, depending on how you change the $x$, you will within a few “steps” up and down, get the right answer. [4]
This painful iterative way is exactly how neural networks self correct.
Let’s say we don’t really know the right $a$ or $b$ either. Can we still solve this problem?
How can we come up with the $a$ and the $b$?
Not surprisingly, we can use the earlier approach of just picking some random numbers, putting them into the equation and then adjusting them up and down till we get the formula we’re looking for.
To keep the moving parts minimal, let’s make it so that $b$ is a constant, so we just pick a random number and leave it as a constant throughout all the iterations. $x$, we pick randomly as before but we will adjust it as we get more information. To begin with we can pick a random number for $a$ and after that we can use a different strategy, described below to update it.
OK, let’s use this to solve a pressing real world problem. Lets say we need to make a neural network that correctly identifies the various parts of a goat from a photograph.
So first we convert a photo into a matrix of numbers. [5]
Then we show the input neuron a part of the photo near the goat’s head. And we’re gonna ask it to guess what it is, but in linear-equation-math.
So to begin with, let $a =1 , b=1 \space and \space x=0$
(keep your imaginations alive)
$Step \space 1: a_1x_1 + b = c_1$ or $1*1+0 =1$
(In non math, this humble output $1$ means “this is a goat’s horn”)
Now, based on the information that it gets back, it will know if the x should “step” up or down. Let’s say the feedback says, “too high bro”.
So then, let’s step down the $x$ and let $x_2 =0.5$
Given that we already have a piece of data sitting in our neuron, which is the last output ($c_1$), instead of starting with a brand new $a$ (or answer), we can just modify this answer using the new $x$ and the same constant $b$, that way it’s nudged in the right direction. So we use $a_2 = c_1$ . This is made possible because $c_1$ isn’t a random number, it’s the result of the assumptions you made earlier. Or, it is “This is a goat’s horn”, which you can step down or modify using the new information.
$Step \space 2: a_2*0.5 + b = c_2$
What happened mathematically is that we produce a result something like “this is a goat’s neck”.
You can imagine, that if you repeat this enough times, based on how large your “step” for $x$ is, and based on what the other neurons in the system are saying, you will at some point get the right value of $c$, which could be “this is a goat’s right earlobe”, and this is now what it has “learned”.
So the next time it sees something like a earlobe, it will be able to identify it instantly and not confuse it with other dangling objects that goats possess.
And isn’t it cool that we could come up with a way to just make a wild guess, then with feedback adjust the wild guess into a coherent answer? (Wait, is that how humaa learning works?)
I think now you might be able to see that $x$ is in a sense the the importance or weight you give the input $a$ to produce an output $c$ and the $b$ acts as a a constant nudge in a particular direction or a bias.
So, in summary
A neuron gets “inputs $(a)$”, multiplies it by a “weight $(x)$” adds it to a “bias $(b)$” to produce an “output $(Y)$” (I know we called it a $c$ earlier, but confusion is our friend). How do we get to a final output? By summing these iterations. In math:
$\displaystyle\sum (weights*inputs) + bias = Y$
That fancy squiggle means SUM of the things to its right.
And the updating of the weights happens by passing back information about if our answer was right or wrong and in which direction. In computer science this is called back-propagation of error with gradient descent, that’s not important, but it is useful for showing off.
We’re nearly done, there’s just one more thing I mentioned earlier, a threshold of activation, which will wrap up the whole thing neatly.
While a neuron produces all this data, we don’t want it to fire all the time right? Suppose what it “heard” was too soft for a meaningful answer, or if it was so “loud” so it PASSES ALONG A SHOUT LIKE THIS!!!!!!. These kinda signals can increase the error, or confusion.
So for hygiene, it’s better that we pass on only information that is of uniform volume all over the network and only if it passes some kind of a test of importance (quality check). To do this, we could pass the output $(Y \space or \space c)$ through a mathematical transformation that achieves this. This transformation is called the activation function.
Depending on the type of task, the activation function could be something like, take the mean, or convert into a range between 0 and 1 or something else
Those of you who remember your physiology lectures might notice how much this is like an activation potential for a neuron. That is not a coincidence. What ANNs do is inspired by real neurons.
An activation function is often represented with the Greek letter phi ($\phi$)
To update our earlier equation
$\phi(\Sigma(weights*inputs) + bias) = Y$
So now we have a way of making guesses that slowly move in the right direction, and we have an activation function that makes sure only the good stuff gets passed along in a standardized way.
Wait, now we know how a linear equation can be solved with guessing, but this technique isn’t limited to linear equations. Almost any kind of mathematical equation could be solved by initially random guessing and then passing the information about how wrong you are back, and updating the guess. In math this is called the universal approximation theorem and it is kind of a big deal.
That’s it. That’s how a neural network do.
Congratulations you now have a working understanding of how math equations can be fiddled with to get to the right answer. And that is the basis of all artificial neural networks, and a whole lot of AI and ML. [6]
Further Reading
- Han, Su-Hyun, Ko Woon Kim, SangYun Kim, and Young Chul Youn. “Artificial Neural Network: Understanding the Basic Concepts without Mathematics.” Dementia and Neurocognitive Disorders 17, no. 3 (2018): 83. https://doi.org/10.12779/dnd.2018.17.3.83.
- If you prefer a more mathy explanation :
Michael A. Nielsen, “Neural Networks and Deep Learning”, Determination Press, 2015
Footnotes
[1]: Seeing: Introduction to Psychology ↩︎
[2]:Connectionism on Stanford encyclopedia of philosophy. It’s a fun read. ↩︎
[3]: Papadatou-Pastou, M. (2011). Are connectionist models neurally plausible? A critical appraisal. Encephalos, 48(1), 5-12. ↩︎
[4]:Some of you smarty pants would like to point out that a $>=$ wont ever give us the right answer but please sit down this is a loosely true mathematical explanation and other people get it, this isn’t for you anyway. ↩︎
[5]: At this point you just have to trust me that this can be done, but also check out this post about pixels and greyscale images and how they are formed ↩︎
[6]: Some of you might want to ask, say, what is artificial intelligence then? Well, AI, is just artificial neural networks making a whole lot of really really complex guesses and then self correcting them till they are kinda accurate. ↩︎