An Introduction to AI

By Imtiaz Adam, DeepLearn007.

What is Artificial Intelligence (AI)

AI deals with the area of developing computing systems which are capable of performing tasks that humans are very good at, for example recognising objects, recognising and making sense of speech, and decision making in a constrained environment.

Narrow AI: the field of AI where the machine is designed to perform a single task and the machine gets very good at performing that particular task. However, once the machine is trained, it does not generalise to unseen domains. This is the form of AI that we have today, for example Google Translate.

Artificial General Intelligence (AGI):a form of AI that can accomplish any intellectual task that a human being can do. It is more conscious and makes decisions similar to the way humans take decisions. AGI remains an aspiration at this moment in time with various forecasts ranging from 2029 to 2049 or even never in terms of its arrival. It may arrive within the next 20 or so years but it has challenges relating to hardware, energy consumption required in today’s powerful machines, and the need to solve for catastrophic memory loss that affects even the most advanced deep learning algorithms of today.

Super Intelligence:is a form of intelligence that exceeds the performance of humans in all domains (as defined by Nick Bostrom). This refers to aspects like general wisdom, problem solving and creativity.

Classical Artificial Intelligence:are algorithms and approaches including rules-based systems, search algorithms that entailed uninformed search (breadth first, depth first, universal cost search), and informed search such as A and A* algorithms. These laid a strong foundation for more advanced approaches today that are better suited to large search spaces and big data sets. It also entailed approaches from logic, involving propositional and predicate calculus. Whilst such approaches are suitable for deterministic scenarios, the problems encountered in the real world are often better suited to probabilistic approaches.

The field has been making major impact in recent times across various sectors including Health Care, Financial Services, Retail, Marketing, Transport, Security, Manufacturing and Travel sectors.

The advent of Big Data, driven by the arrival of the internet, smart mobile and social media has enabled AI algorithms, in particular from Machine Learning and Deep Learning, to leverage Big Data and perform their tasks more optimally. This combined with cheaper and more powerful hardware such as Graphical Processing Units (GPUs) has enabled AI to evolve into more complex architectures.

Machine Learning

Machine Learning is defined as the field of AI that applies statistical methods to enable computer systems to learn from the data towards an end goal. The term was introduced by Arthur Samuel in 1959.

Key Terms to Understand

Features / Attributes: these are used to represent the data in a form that the algorithms can understand and process. For example, features in an image may represent edges and corners.

Entropy: the amount of uncertainty in a random variable.

Information Gain: the amount of information gained as a result of some prior knowledge.

Supervised Learning: a learning algorithm that works with data that is labelled (annotated). For example learning to classify fruits with labelled images of fruits as apple, orange, lemon, etc.

Unsupervised Learning: is a learning algorithm to discover patterns hidden in data that is not labelled (annotated). An example is segmenting customers into different clusters.

Semi-Supervised Learning: is a learning algorithm when only when a small fraction of the data is labelled.

Loss Function:  is the difference between what is the ground reality and what the algorithm has learned. In machine learning the objective is to minimise the loss function so the algorithm can continue to generalise and perform in unseen scenarios.

Methods (non-exhaustive list)

Regression, Clustering and Classification are the 3 main areas of Machine Learning.

Reinforcement Learning:is an area that deals with modelling agents in an environment that continuously rewards the agent for taking the right decision. An example is an agent that is playing chess against a human being, An agent gets rewarded when it gets a right move and penalised when it makes a wrong move. Once trained, the agent can compete with a human being in a real match.

Linear Regression: an area of machine learning that models the relationships between two or more variables that have continuous values.

Decision Trees: an algorithm to learn decision rules inferred from the data. These rules are then followed for decision making.

k-means: an unsupervised approach to group (or cluster) different instances of data based upon their similarity with each other. An example is to group a population of people based on similarity.

Support Vector Machines (SVM): is a classification algorithm that draws separating hyper plane between two classes of data. Once trained, a SVM model can be used as a classifier on an unseen data.

Boosting and Ensemble: is a method used to take several weak learners that perform poorly and then combine these weak learners into a strong classifier.

Principal Component Analysis (PCA): is a method to reduce the dimensionality of the data whilst still maintaining the explainability of the data. This is useful to get rid of redundant information present in the data whilst preserving the features that explain most of the data.

Simultaneous Location And Mapping(SLAM): deals with methods that robots use to localise themselves in unknown environments.

Random Forest:belongs to the family of ensemble learning techniques and entail the creation of multiple decision trees during training with randomness while constructing the tree. The output would then be the mean prediction from the individual trees or class that represents the class mode. This prevents the algorithm from overfitting or memorising the training data.

Logistic Regression:is a classification technique that models the logit function as a linear combination of features. Binary logistic regressions deals with situations where the variable you are trying to predict has two outcomes (‘0’ or ‘1’). Multinominal logistic regression deals with situations where you could have multiple different values for the predicted variable.

Evolutionary Genetic Algorithms:these are biologically inspired algorithms that are inspired from the theory of evolution. They are frequently applied to solve optimization and search problems by application of bio-inspired concepts including selection, crossover and mutation.

Neural Networks:are biologically inspired networks that extract abstract features from the data in a hierarchical fashion. Neural networks were discouraged in the 1980s and 1990s. It was Geoff Hinton who continued to push them and was derided by much of the classical AI community at the time. A key moment in the history of the development of Deep Neural Networks (see below for definition) was in 2012 when a team from Toronto introduced themselves to the world with the AlexNet network at the ImageNet competition. Their neural network reduced the error significantly compared to previous approaches that used hand derived features.

Deep Learning

Deep Learning refers to the field of neural networks with several hidden layers. Such a neural network is often referred to as a deep neural network.

Few of the main types of deep neural networks used today are:

Convolutional Neural Network (CNN): A convolutional neural networks is type of neural network that uses convolutions to extract patterns from the input data in a hierarchical manner. It’s mainly used in data that has spatial relationships such as images. Convolution operations that slide a kernel over the image extract features that are relevant to the task.

Recurrent Neural Network (RNN): Recurrent Neural networks, and in particular LSTMs are used to process sequential data. Time series data for example stock market data, speech, signals from sensors and energy data have temporal dependencies. LSTMs are a more efficient type of RNN that alleviates the vanishing gradient problem, giving it an ability to remember both in the short term as well as far in the history.

Restricted Boltzmann Machine (RBM): is basically a type of neural network with stochastic properties. Restricted Boltzmann Machines are trained using an approach named Contrastive Divergence. Once trained, the hidden layers are a latent representation of the input. RBMs learn a probabilistic representation of the input.

Deep Belief Network: is a composition of Restricted Boltzmann Machines with each layer serving as a visible layer for the next. Each layer is trained before adding additional layers to the network which helps in probabilistically reconstructing the input. The network is trained using a layer-by-layer unsupervised approach.

Variational Autoencoders (VAE):are an improvised version of auto encoders used for learning an optimal latent representation of the input. It consists of an encoder and a decoder with a loss function. VAEs use probabilistic approaches and refers to approximate inference in a latent Gaussian model.

GANs: Generative Adversarial Networks are a type of CNN that uses a generator and a discriminator. The generator continuously generates data while the discriminator learns to discriminate fake from real data. This way, as training progresses, the generator continuously gets good at generating fake data that looks like real while the discriminator gets better at learning the difference between fake and real, in turn helping the generator to improve itself. Once trained, we can then use the generator to generate fake data that looks realistic. For example, a Gan trained on faces can be used to generate images of faces that do not exist and look very real.

Deep Reinforcement Learning: Deep Reinforcement Learning algorithms deal with modelling an agent that learns to interact with an environment in the most optimal way possible. The agent continuously takes actions keeping goal in mind and the environment either rewards or penalises the agent for taking a good or bad action respectively. This way, the agent learns to behave in the most optimal manner so as to achieve the goal. AlphaGo from DeepMind is one of the best example of how the agent learned to play the game of Go and was able to compete with a human being.

Capsules:still an active area of research. A CNN is known to learn representations of data that are often not interpretable. On the other hand, a Capsule network is known to extract specific kinds of representations from the input, example it preserves the hierarchical pose relationships between object parts. Another advantage of capsule networks is that it is capable of learning the representations with a fraction amount of data otherwise the CNN would require.

Bio: Imtiaz Adam is focussed on the development of Artificial Intelligence and Machine Learning techniques with a particular focus on Deep Learning.

Original. Reposted with permission.

Resources:

Related: