Artificial Intelligence For All: Understand It? Everyone Can
Technology doesn’t always have to sound daunting to understand. Terms do not always have to be so esoteric, exclusive and abstract. I am an advocate for the simplicity. As Leonardo Da Vinci said
“Simplicity is the ultimate sophistication”
Albert Einstein:
“If you can’t explain it simply, you don’t understand it well enough.”
Words ever true.
As an engineer, sometimes I get frustrated by big terms describing very simple concepts. I like simple things. I would be like could you not just call it that simple word?
But I get it. There are too many concepts that could take the same simple word in different fields. So each field strives for distinction by naming it by the Greek name of the first person who discovered it, or rather the first known person who documented and publicized it.
The problem is that these names are often just not sounding as simple as what they might be describing. And that is alienating for a lot of people.
Simplicity is inclusive
In this article I will speak AI in English without the technical jargon so that anyone who understands English can get the depths of it.
What is It?
Artificial intelligence simply means the ability for computers (usually in other machines) to do tasks intelligently without being explicitly told how to do them. They are intelligent to make decisions. An example could be recognizing animals: cat, dog, horse, cow etc.
With artificial intelligence, you do not need to tell the computer explicit rules for distinguishing between these animals like if it barks it is a dog, or if it meows it is a cat.
Explicit rules might give a semblance of intelligence but that is not artificial intelligence.
Artificial intelligence is allowing the computer to learn the rules the way only it can do because we cannot possibly define all the rules explicitly.
What we do with AI instead is to show the computer images of dogs, cats, horses and cows and it learns it’s own “rules” for distinguishing between them. As the engineers building the intelligence, we don’t even understand the rules that AI comes up with. We simply ensure that whatever it learns is working well for our use case.
How do Machines Become Intelligent?
The simple answer is we teach them. In the example above, we show the computer images of cats, dogs and horses and it will learn what the differentiating features between them are by itself.
How Do We Teach Them?
If you have spoken to anyone working in the AI, or data science fields, you might have heard the phrases: model, training a model, the model is learning or not learning, machine learning, deep learning, etc.
The model is really just like a brain that does the decision making in an AI system. The techies might call it a machine learning model or deep learning model but that’s just terminology loosely suggesting how the model was built and they are often used interchangeably.
As I said, the model is analogous to the brain. However, it is like a very small specialized brain meant for a highly specific or specialized tasks. e.g recognizing animals.
It could do better than most humans on the specialized task but unlike the human brain which can do a thousand other tasks, models can only do one specialized task that they have been trained for and each model can only be trained to do one thing.
The Process of Building AI Models
An AI model is built by writing algorithms in computer code and then feeding the algorithm with data (e.g pictures of cats, dogs, horses and cows) so that it can learn the patterns (rules) it will use to distinguish between the data.
After writing the algorithm in computer code (equivalent to building an empty brain), we get an untrained model (a brain that hasn’t learned anything yet). The process of feeding the untrained model with data is what is described as training the model. While we are training the model, the model might be learning or not learning.
Learning means it is identifying patterns in the data and mapping the patterns of each data point to the correct result in a way that allows it to do same for new data points it was fed during training. Example: it identifies the patterns in the dog images and maps them to dogs, identify patterns in cat images and maps them to cats and when given new dog and cat images, it can do the same correctly.
An AI model (brain) is therefore algorithm code + data producing ability to recognize patterns.
When this model encounters data that it hasn’t seen before, it can identify patterns in the new data using knowledge from the patterns it has seen before during training. And with the patterns, it can make a decision like this is a horse and that is a cow.
Analogy: Teaching a Kid
AI models start out actually like learning babies. The newly written model algorithm (computer code) is analogous to the brain of a new born baby. Empty though not nearly as capable. It is powerful and can learn but it hasn’t been taught anything.
For the purpose of simplification in this article, we will use to 5 stages of mastery (novice, advanced beginner, competent, proficient and expert) to loosely describe and differentiate how well a model is trained and can perform a task. (this is my attempt to use inclusive language instead of technical jargon)
Consider a novice model to be one that hasn’t been trained at all. It’s only code that hasn’t seen any data. The others follow after novice just like humans do.
Cool?…………. Okay! Let’s keep going.
Training a model is the process of showing it data. And we do it the same way you would teach a child the difference between your cat and dog. You point to the cat and say ‘cat’, you point to the dog and say ‘dog’. Likewise, we send images to the model with labels that says this is a cat, this is a dog, this is a dog, this is a cat, …..
After many repetitions (iterations), you would point to the cat and wait for the child to make a prediction. The child says cat, you clap. And point to the dog. The child says dog, you celebrate. If they make a mistake you say uh uh, no no. Cat, Dog while pointing a few more times.
Most models learn the same way. We feed the model lot of data (images in this case) with labels that say cat, dog, horse, or whatever the correct label is. This is called supervised learning. We are training (teaching) the model by showing it samples of what we want it to learn while telling it what the sample is. Thus, the model’s learning is supervised.
Now you can see why data is a big deal in this enterprise of training models. Just like with teaching kids, the correct information is crucial. If you teach the child with data that doesn’t fully represent all dogs and cats, they will make mistakes.
Take for example that all the dogs you showed your child were black, and all the cats had grey fur. The child is likely to start using color as the feature to tell the difference between cats and dogs. As such, if they don’t know what a horse is, they might look at a black horse and say ‘dog’ because all the dogs you showed them were black and now they think black is associated with dog. We are aware of this and that’s why when teaching kids, we use different dogs of different colors in pictures and on TV.
In supervised learning, models like children will learn the incorrect patterns if the data we use to train them is dominated by some non fundamental differentiating feature too. So we have to endeavor to show the model so many variations of what we want it to learn.
If we want our model to learn the difference between cats and dogs. We actually need to show it a lot more images than we would a child. So we collect thousands of images of different types of dogs and cats in different environments and backgrounds, different postures, angles, lighting conditions, etc. The variation in the data needs to be representative of what is possible in the real world for the model to do well in the real world.
Remember, I have been describing models learning by what method? Supervised Learning: where we tell the model what each data sample is and allow it to learn patterns and map them to that sample. Supervised learning is the most widely used approach in training models and the most reliable.
But there are other ways by which models learn. There is
- Unsupervised learning,
- Semi-supervised learning and
- Reinforcement learning.
Let’s continue with our teaching a baby analogy to explain these just briefly.
A scenario in which a child learns by unsupervised learning
Let’s say you have a kid, (let’s call him Kunta and assume he is 3). You buy Kunta 7 hand-held toy cars, 7 toy bikes, and 7 toy bicycles. You put all these in one toy bag and hand them to him. Assume he has never seen any of these before and doesn’t know what they are.
Unsupervised learning is allowing Kunta to play with his toys all by himself without telling him what they are. After some time, Kunta will still not know exactly what the toys are by name, but you will notice that he has grouped all the cars together, all the bikes separately and all the bicycles. He has learned the toys that belong together because of their common features: they look alike.
Putting things that are similar and so belong in the same category together like this is called clustering. It is a typical unsupervised learning technique. Also Kunta will by himself learn that cars can stand without support and occupy more space than bikes and bicycles. He still doesn’t know what they are but he has made associations between the objects and the ability to stand without support and how they use space. Making associations means finding relationships with other external things. Besides clustering, unsupervised learning is used to find associations. Let’s see examples of these.
Market segmentation: Clustering is typically used to group customers with similarities together. Just like with the toys and Kunta, a model doesn’t know who customers are or what they like but with data about them, it can group customers with similarities together like similar shopping habits, or similar shopping hours. Associations can also be made where the model can learn that customers who buy item A also tend to buy item B. So you may not have similarities with customers who buy diapers, but if you happen to buy one, (say you are getting it for a friend), the model can be like: customers who bought this item (diapers) also bought diaper creams or rash ointments. I bet you have seen this on e-commerce sites like Amazon, and movie streaming sites like NetFlix. There’s an expert unsupervised learning model in the background doing the predictions.
Still in your toy bag with Kunta, Semi-supervised learning is you picking for instance one object from each category and telling Kunta that this is a car, this is a bike and this is a bicycle. Now, from that one labeled sample, Kunta knows all the cars, bikes and bicycles. You didn’t not need to label each and every sample as in supervised learning. Hence, semi-supervised learning is a balance between supervised and unsupervised learning and saves a great deal of time and effort that would be necessary in labeling each and every sample.
How Kunta Learned What ‘Hot’ meant: Reinforcement learning
This is how Kunta learn that the iron was not safe to touch sometimes. i.e: when it was plugged in or that food could burn and sometimes you need to wait. Reinforcement learning works by reward/punishment approach. A plugged in iron burns; not plugged in, it doesn’t burn. It took Kunta a while and a series of confusions to get it but he did.
You couldn’t very effectively tell him what hot is or that an iron could and couldn’t be hot or that it wasn’t safe sometimes and safe at others. He had to experience it, got punished and then you helped him associate the burning sensation to a word, hot. Hot do not touch. And eventually you helped him associate a hot/cold iron to plugged in or disconnected from the power socket.
Many models are increasingly being trained this way especially models deployed in robots that explore the real world. Some actions are rewarded and others are punished. Reward means keep going, punishment means change, try something else.
Some brains are smarter than others
Across all learning approaches discussed above, and using terminology adopted and used very loosely only in this article, we could have a novice model, advanced beginner, competent, proficient or expert model in terms of it’s mastery of the task its meant for. Imagine training a model till it becomes an expert in its ability to do its tasks. That takes a lot of data and a lot of time to train.
However, the amount of data needed and the time it takes will depend on the model’s mastery level when you start. If your starting point is a novice model, it will require a lot more data and time than if the model is at proficient mastery level already. i.e. it has been trained before.
Training a model from novice level is less common in the field of artificial intelligence today. Mostly we start with a model that has learned from some other data in a related task and then we train it further with more specialized data suited for the task we want it to perform. This increases the intelligence of a model for the specific purpose we have at hand. The techies call this transfer learning. It’s like saying
Hey model you already know how to differentiate between cats and dogs very well. Take this data and using the knowledge you already have tweak it to distinguish between these owls and eagles.
Let’s check on Kunta to make this clearer. By the time Kunta is 5, he already knows his shapes very well. But he has never seen a trapezium, and parallelogram. However his brain is no longer a novice when it comes to shapes. So you say “Hey Kunta, I am sure you haven’t seen these before. Here’s is trapezium and here’s a parallelogram.”
Let’s ignore the fact that Kunta can’t stop laughing at your ridiculous new word pa-ra-re-ro-gram.
Learning this is rather very easy for him. He already knows how to map features quickly and he does that with very little effort compared to when he was first introduced to shapes. He is transferring old knowledge to quickly learn new related things.
This is why achieving impressive intelligence on new tasks in the field of AI doesn’t always require lots of data. The power of transfer learning makes that a breeze compared to starting from scratch.
It is possible to achieve impressive intelligence with 100 samples of well labeled data suited for a new task by leveraging an already trained model and transfer learning.
And this is something even a lot of techies don’t think about promptly enough. Ask them to build a model and many start thinking from scratch — i.e: building a new model brain and looking for impossible to find data to train it.
Let’s leave Kunta and come to the world of adults. A round of applause for Kunta as he exists the stage please 👏 👏👏👏. Phenomenal kid.
Now, you take the stage. I suppose a round of applause is due as well 👏👏.
Imagine the Tuku’s are your neighbors. You’ve seen them so many times that you are familiar with their resemblance and key features like height, size, shape, smile, the way they laugh, walk, and silly mannerisms. And then, one day while out of town for a conference, you meet a random stranger who looks and have some mannerisms like the Tuku’s back home. On asking, this person turns out to be a relative of the Tuku’s.
Now, imagine that 97% of the time you saw the Tuku’s, they were all wearing some funny blue jackets (they have an idiotic sense of style). If your new stranger doesn’t have a blue jacket, and rather has an exquisite sense of style, even if he looks like the Tuku’s on some features, this is going the reduce your confidence that this person could possibly be related to the blue jacket wearing, funny sense of fashion Tuku’s.
Also, a totally non-look-alike person wearing the same unique blue jacket in a seemingly unfashionable way is going to raise your intrigue that this person must be related to the blue jacket wearing Tuku’s. Your judgement has been biased by something that isn’t fundamental in recognizing people who may or may not be related.
This is bias in the real world. Your judgement is biased by your skewed experience of the Tuku’s. Your brain is not the problem but the data it had seen prior. This is the same thing with models. State of the art models can sometimes make very stupid conclusions because the data they have seen has been dominated by some feature which is not the most important.
A funny example my lecturer recounted back in grad school is where a model was trained to detect military tanks. Remember what the training process looks like? It is supervised learning. You feed the model labeled images with tanks and without tanks. So after this model was trained it seemed to be doing very well on some images but rather horrible in others. What was the problem? It turned out that in the images used for training, a large majority of images with a tank were captured in sunny weather and a large majority of images without a tank were captured in dull weather. So what the model actually learned was to differentiate between dull and bright weather.
The moral of the story: Data is the life blood of AI — good, diverse data that is representative of the real world in which the model will be used.
Good, diverse, data is the life blood of good, safe, artificial intelligence just like good and accurate information is the life blood of your operating properly in the world. If you have incorrect information, you are fucked and you are going to fuck others too. And the more incorrect information you have the more fucked you are, and the worse off the people you deal with. There is no AI without good data and the definition of good here makes for another article in itself.
Conclusion
AI is going to cut through all sectors of society. You do not need to understand it fully to use it. But you cannot afford to be clueless about it. I wrote this article explaining AI as an industrial revolution scale technology just like electricity, just like the internet, which we all use today even though we do not understand them fully. At minimum, you need to adopt AI in that way. I hope I did at least bring you clarity in plain English.