Deep Learning At A Surface Level

Chloe Wang
9 min readOct 3, 2021

--

Artificial intelligence (AI) is recognized as the future of not only technology, but humanity itself. It is the general consensus that within this century, AI will revolutionize the way humans live across nearly all industries.

We define artificial intelligence as the following:

The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision making, and translation between languages. (Oxford Dictionary)

Essentially, a computer that can think like a human. Today, applications of artificial narrow intelligence (AI that can only perform specific tasks) are present everywhere, ranging from Google’s search recommendations to Tesla’s self driving cars.

Google’s uses natural language processing, machine learning, and deep learning to make their predictive search engine.
Tesla uses computer vision, image detection, and deep learning to make their self-driving cars, by Fortune

Even Netflix uses AI to recommend which shows you should watch.

Netflix uses machine learning to give recommendations on what you should watch next, from Columbia Spectator

There are many subsets of AI, including machine learning (ML) and deep learning (DL).

What Is Machine Learning?

At a high level, ML is a subset of AI that provides computers (or machines) the ability to learn automatically and improve from experience without being explicitly programmed to do so. This subset was invented out of the need for models to analyze complex and large amounts of data, improve decision making, uncover patterns and trends in data, and solve complex problems.

ML is a very broad topic in itself, but there are a few key concepts of ML that you should be familiar with.

Types of ML

The basic process of machine learning is to provide a ML model with training data and to have it classify that data according to algorithms like linear regression, logistic regression, K-means, C-means, etc.. Then you can test it’s ability to classify data using a test data set.

Supervised learning is a technique where we teach/train a machine using well-labeled data. Both the input and output data are labeled. Supervised learning is generally used for regression and classification problems.

For example, we could provide labeled images of cats and dogs, and the output would (ideally) be one class of cats and one class of dogs.

Unsupervised learning is what it sounds like — a technique where we teach/train a machine using unlabeled data and allowing the algorithm to act on that information without guidance. Instead of classifying based on labels, the machine clusters data together based on similar characteristics.

For example, we could provide unlabeled images of cats and dogs, and the output would (ideally) be one cluster of cats, which would be recognized by their characteristics like pointy ears, and one class of dogs, which would be recognized by their round ears.

Reinforcement learning (RL) is another subset of ML where an agent is put in an environment and learns to behave in the environment by performing certain actions and observing the rewards it gets from those actions. Unlike supervised and unsupervised learning, RL agents don’t receive input data — the agent simply explores the environment. It followed a trial and error method and learns how to behave through rewards.

For example, let’s make the agent learn to play Mario Kart. If the agent moves forward across checkpoints in the course, then it earns points. If it doesn’t move, it doesn’t earn any points. If it moves backwards, it loses points. Since the agent is optimizing for reward, after many trials, it will learn to continuously move forward across checkpoints to receive the most reward.

All examples provided are heavily simplified.

Limitations of ML

However, these forms of ML have their limitations. To start, these subsets of ML are not capable of handling high dimensional data, where the input and output is very large. Realistic problems have thousands of dimensions that cannot be processed, which limits ML when solving problems.

These forms of ML also cannot extract features very well. It’s difficult to tell the ML model what data to look for, so the effectiveness of the algorithm also heavily relies on how insightful the programmer is.

Despite this, there is a subset of ML that can overcome some of these challenges — deep learning.

Deep Learning, At A High Level

Deep learning (DL) is a subset of ML that can overcome some of these limitations previously mentioned. Deep learning is defined as a collection of statistical ML techniques used to learn feature hierarchies based on the concept of artificial neural networks, which are based off neurons in the brain.

It is capable of properly extracting features and requires little assistance from the programmers. Certain DL models can also partially solve the dimensionality problems.

DL operates similarly to a brain. In a brain, there are dendrites to receive signals from other neurons, cell bodies that are the sum of all inputs, and axons that are used to transmit signals to other cells. Just as in the brain, DL models have perceptrons that receive information from previous perceptrons and layers that are the sum of all inputs.

Neuron Vs Neural Network, by Brian Mwandau

Before we start exploring single and multi-layer perceptrons, let’s first define what a perceptron is.

A perceptron is a linear model used for binary classification. It models a neuron with a set of inputs (X1, X2, … Xn), each with a given specific weight (W1, W2, … Wn). It then computers some function on these weighted inputs to provide the output (Y1, Y2, … Yn).

Weights are determined by how important an input is when determining the output. For example, if you were deciding whether or not to go on a relaxing beach date, the weather would be more important than how much money you bring. Therefore, the weather input would have more weight than the money input.

Single-Layer Perceptrons

Single layer perceptrons are linear or binary classifiers and are mainly used in supervised learning. They separate data into multiple classes and able take to take in multiple weighted inputs.

In the first step of the model, the weighted inputs go through summation. Essentially, inputs are multiplied by their respective weights, and then those values are all added to get the weighted sum. Then the weighted sum is passed onto a transfer function (also known as an activation function). The results are this function are factored into determining the output.

Single layer perceptrons are limited in that there are no hidden layers (only input and output layers). Therefore, you can’t classify non-linearly separable data points, and complex problems cannot be solved by a single-layered perceptron.

Multi-Layer Perceptrons and Artificial Neural Networks

Multi-layer perceptrons have the same structure of a single-layer perceptron, but it has one or more hidden layers and is therefore considered a deep neural network. They still have inputs, weights, and outputs.

A multi-layer perceptron works by starting with set of inputs are passed to the first hidden layer. The activations from that layer are then passed to the next layer, and so on, until the output layer is reached. This can be characterized as a feed-forward network, where each node in a layer is connected to a node in the next.

The weights between units of layers are the primary means of long-term information storage in a neural network. Updating the weights is the primary way that the neural network learns new information.

The process of updating weights is referred to as backpropagation. It works by calculating the weighted sum of inputs and passing them through the activation function. Then the model propagates backwards and updates the weights of the input values to reduce the error and get closer to the desired output.

For instance, in the beach date example, let’s say our model had no context and randomly assigned weights to the money and weather inputs. It might give the money input more weight than the weather input. When it tests its predictions with the test data, though, it would be less accurate. So the model may update the weights so that the weather input has more weight than the money input, and when it tests its predictions, it would be more accurate.

The left has a single-layer perceptron, where there are only inputs and outputs. The right has a multi-layer perceptron, which has inputs, hidden layers, and outputs. By Nahua Kang

Recurrent Neural Networks

Feed forward networks are limited in that the previous value of a node has no effect on the value of the node it is connected to in the next layer. Therefore, feed forward networks cannot be used in cases where the prediction of the output must be based on the previous output.

Recurrent neural networks are a type of neural network designed to recognize patterns in sequences of data, such as texts or genomes. It uses information from the input of the previous output to predict the next output.

For example, let’s say you create a model where you try to predict the workout you decide to do. You could base your prediction of the day of the week (unrelated to the workout), or you could consider what you did in your previous workout.

Convolutional Neural Networks (CNNs)

CNNs are neural networks where a node in a layer will only be connected to a small region of the layer before it, instead of all the nodes in the layer before it. It is still made of neurons with learnable weights. It puts weights on the input values and passes them through an activation function to get an output. When you are calculating the output of each layer, it takes in some information from the previous layer.

CNNs cannot make use of fully connected networks. In large problems with large dimensions, it’s unrealistic to use fully connected networks.

For instance, let’s try to process an image by graphing each pixel based on color. When you put it into a fully connected network, an image with the size of 28 x 28 x 3 will have 2352 weights in the first hidden layer. An image with a size of 200 x 200 x 3 will have 120,000 weights in the first hidden layer. Not only is this difficult to process, but it leads to data overfitting, where each and every node is connected and the output is not what you want.

Applications of Deep Learning

The applications of deep learning are incredibly vast, but these are some applications I think are pretty cool:

  • Toxicity detection for different chemical compounds — Deep learning models have become so good at predicting the toxicity of different chemical compounds that they can quickly identify toxic compounds that took scientists decades to identify.
  • Water simulations — Fluid mechanics require vast amounts of data to simulate, so using AI/DL is perfect for making fluid simulations.
  • Image colorization — In the past, colorizing black and white images has been extremely difficult. It usually requires the work of experienced artists that have knowledge in color and the context of the image. However, by training models on the characteristics of many settings, they can colorize black and white images much faster.
  • Image generation — The most popular form of image generation is generating human faces. The model is trained on huge datasets of existing people, and creates its own person.
These people don’t exist. They were created by a GAN (type of deep learning) model on https://thispersondoesnotexist.com/.
  • Playing Mario Kart — Deep learning can also be used to play video games, including Mario Kart. Developer SethBling was able to use a neural network to play Mario Kart, and called it MariFlow. You can watch his video here.
  • And more!

Have feedback or questions? Send me an email at chloewang.lv@gmail.com and I’ll be happy to respond!

You can check out more of what I’m up to in my monthly newsletters. Sign up here.

--

--

Chloe Wang

18 y/o that loves space, ai, and robotics | MechE @ Caltech