What Killed the Curse of Dimensionality?

How does Deep Learning overcome this hurdle in machine learning and why?

To begin we should clearly define the Curse of Dimensionality:

A phenomena that occurs when the dimensionality of the data increases, the sparsity of the data increases.

Data has dimensionality to it. The more dimensions that are added to the data, the more difficult it becomes to find patterns. Think about dimensionality as the range of movement of an animal you’re playing tag with. If you’re chasing an animal that can only move on the ground they can only go in 2 dimensions left or right(x), forward or backward(y). It’s harder to catch a bird because that bird can move in 3 dimensions left or right (x), forward or backward (y), up or down (z). We can imagine some mythical time traveling beast that can move in 4 dimensions left or right(x), forward or backward(y), up or down(z), past or future(t). You can see this gets more difficult as the dimensions increase.

This same problem applies with data and machine learning. As the data’s dimensionality increases the sparsity of the data increases making it harder to ascertain a pattern. There are certain ways around the curse of dimensionality in traditional ML that require certain techniques such as function smoothing and approximation. Deep Learning has overcome this “curse” by it’s inherent nature and may be one of the contributions to the increased popularity.

Deep Learning

In high dimensionality applications deep learning does not suffer from the same consequences as other machine learning algorithms such as Linear Regression. This fact is part of the magic that makes this methodology of modeling with Neural Networks so effective. Neural Network’s imperviousness to the curse of dimensionality is a helpful characteristic in today’s world of big data.

There are multiple theories on why this occurs. We will now quickly go over the main ideas of each:

Linear Manifold Manifold Hypothesis [1] At a high level the Manifold Hypothesis suggests that high dimensional data actually sits on a lower dimensional manifold embedded in higher dimensional space.

In a hand-wavey sense this nearly means that in this high dimensional data there is some underlying pattern in lower level dimensions that deep learning methods are good at exploiting. So given a high dimensional matrix that represents images, neural networks excel at finding low dimensional features that are not apparent in the high dimensional representation.

Image from deep learning book. Manifold over high dimensional data The image above represents a bunch of data points in a high dimensional space. It is in 2-d for easy representation on this blog post. The Image below it shows that there is some manifold in this high dimensional space where most of the data lies. Neural networks and Deep Learning methods exploit that and theoretically find that manifold.

Visualization of activations from different neurons of a Convolution Neural Network Sparse Coding [2] This is the occurrence when the data flows through the different neurons in a network. Each neuron in the Neural network has it’s own activation functions. When each neuron fires on it’s activation it is causing sparse coding.

The image above depicts the visualization of activations of a neural network. Each individual neuron is picking up on a different feature found from being fed an image. The representation above shows how the different neurons pick up on different features and each individually contribute to the final output of the network.

For example the English language is composed of 26 letters. These letters can make all the words of the language. In terms of neural networks there are some number of neurons that when fired can be combined with the firing of other neurons to combine to output the correct answer no matter the dimensionality of the inputs.

Conclusion

There’s no sole proven theory that indicates why Neural Networks overcome the curse of dimensionality. There’s a lot of research going on to understand the different aspects and underlying characteristics that allow Deep Learning and Neural networks to be so effective in practice. Until then we have a good idea on what types of concepts are going on under the hood, that help fuel deep learning and neural networks to be used in practice.

Further Reading:

Manifold Hypothesis [1]: https://www.ima.umn.edu/2008-2009/SW10.27-30.08/6687

Deep Learning Book [1]: http://www.deeplearningbook.org/version-2015-10-03/contents/manifolds.html

[1] & [2] Lecture from Partha Niyogi: https://www.ima.umn.edu/2008-2009/SW10.27-30.08/6687

Wikipedia [2]:https://en.wikipedia.org/wiki/Neural_coding#Sparse_coding

Writeup on Sparse Coding [2]:http://redwood.berkeley.edu/vs265/handout-sparse-08.pdf

Inspired by this reddit post: **https://goo.gl/jJFtP1