Understanding the power of deep learning and how it works
One reason why deep learning is a confusing concept to many is that it is often used alongside the terms Machine Learning (ML) and Artificial Intelligence (AI). Deep Learning (DL) is a subset of ML, which is itself a subset of AI. A DL algorithm is able to learn hidden patterns from the data by itself, combine them together, and build much more efficient decision rules.
Deep learning works best as the amount of data scales, making it popular in industries that collect massive amounts of data. These industries include manufacturing, automotive, hospitality, healthcare, banking, agriculture, entertainment, IT/Security, retail, and supply chain and logistics.
The controversy surrounding deep learning (as well as surrounding AI) is the fear of the 'black box'. That is, how can anyone base a service or product on deep learning and trust the decisions being made if no one knows how they’re being made?
A deeper look at deep learning
One of the key reasons deep learning is more powerful than classical machine learning is that it creates transferable solutions. Deep learning algorithms are able to create transferable solutions through neural networks: that is, layers of neurons/units.
A neuron takes input and outputs a number that assigns the input to a class (group). The output is determined the way you would make a decision. A neuron similarly takes multiple inputs, each with a corresponding weight (importance). The inputs are passed through an activation function which gives the final output.
Deep learning problems boil down to classification - whether binary (e.g., is this image a cat, or not a cat?) or multiclass (e.g., is this image a cat, a dog, a bird, etc.). So finding the optimal features (variables) and parameters (weights) are key. A model can be built with a single layer of neurons, and adding layers lets the computer create more and more specific features that lead to a more complex final output.
Understanding gradient descent
Understanding gradient descent is helpful for understanding deep learning because it’s one of the most popular - if not the most popular - strategy for optimising a model during training (that is, making sure it's 'learning' correctly).
Remember that in deep learning, it's the algorithm that finds the features for the most accurate classification (instead of the human, as is the case in machine learning), so the computer needs a way to determine the optimal features and weights - that is, ones that lead to the most accurate final classification.
The details of gradient descent
This happens through choosing the features and weights that minimise some error/cost function. The error/cost function is the sum of loss functions (predicted value of a point - actual value of a point) + a regularisation term. The regularisation term penalises models with many features to prevent overfitting (being accurate for a specific dataset but failing to generalise).
To minimise our error function, we use gradient descent: the computer chooses certain parameters (features and weights) and takes the negative gradient (gradient is the rate of greatest increase, so the negative gradient is the rate of greatest decrease) of the error function until it finds the parameters that lead a gradient of 0 (corresponding to a minimum of the error function).
It works like getting to the lowest point on a mountain as quickly as possible: you walk in the direction of steepest decrease until you hit a minimum. For example, here we keep adjusting the line until we have minimised the classification error (larger dots correspond to larger errors).
Type of neural networks
There are countless types of neural networks. Here is an overview of some of the most relevant types:
- Feed Forward - Used in computer vision and speech recognition when classifying the target classes are complicated. Responsive to noisy data and easy to maintain.
- Radial Basis - Considers the distance of a point with respect to the centre. Used for power restoration systems which are notoriously complicated.
- Kohonen - Recognises patterns in data. Used medical analysis to cluster data into different categories (a Kohonen network was able to classify patients with a diseased glomerular vs. a healthy one)
- Recurrent - Feeds the output of a layer back as input. Good for predicting the next word in a body of text, but harder to maintain.
- Modular - Collection of different networks work independently and contribute towards the final output. Increases computation speed (through the breakdown of a complicated computational process into simpler computations), but processing time is subject to the number of neurons.
There is also the increasingly popular Convolutional Neural Networks (CNN), which we discuss more in depth in this guide.
Real life applications and the future of deep learning
The future of deep learning is bright because of its open source community and accessible platforms. Increasingly, corporations such as Apple, Facebook, and Google, are making their technology accessible to the public. In the near future, deep learning will significantly improve voice command systems (such as Siri and Alexa), as well as healthcare and image identification.
Deep learning has applications across numerous industries, which is why experts think that this technology is the future of almost everything. There are truly deep learning technologies such as Google’s very human-like talking AI, a new theory that cracks the 'black box' of deep learning, and various budding ideas like this one about why human forgetting might be the key to AI.
The new theory is especially important given the uptick in data regulation worldwide. General Data Protection Regulation (GDPR), especially, has made global companies think about how and where to use deep learning.
You can download the whitepaper here.