Data Analytics - Artificial Intelligence (AI)

## Introduction

In this tutorial Artificial Intelligence (AI) topics are considered.

We continue with clustering (see Machine Learning tutorial using the expectation maximisation (EM) algorithm. This falls into the area of unsupervised probabilistic learning

Two more supervised-learning methods are considered: artificial neural networks (ANN) and support vector machines (SVM). Their accuracy will be compared to the linear regression model. Predictions and classifications minimise the error between given and predicted response.

Problem solving is achieved using optimisations. A brute-force approach and genetic algorithm (GA) will be introduced. Simulated annealing (SA) and exact methods are briefly mentioned.

A short introduction to Bayesian Belief Networks (BBN) will be given, which is commonly used when reasoning under uncertainty.

## Probabilistic Learning - EM

We will use the iris dataset to identify three species. We assume the classification has not happen, hence, clustering is required.

Pairwise scatterplots show us the relationship of variables. This is usually the first step in an anlysis.

library(mclust)
clPairs(iris[,1:4])

There is another library GGally, which provides an even more aesthetic visualisations.

library(GGally)
ggpairs(iris[,1:4])

The diagonal shows the density distribution of each feature. The upper-triangular matrix contains the Pearson-correlation coefficients. Here, wee see that petal length and sepal length are strongly correlated. Similarly, the petal’s length and width are strongly correlated.

### Expectation maximisation

Expectation maximisation (EM) is a probabilistic learning method. The algorithm is used in the library mclust.

First we run the model with the default parameters. How many cluster have been identified?

# library(mclust)
em = Mclust( iris[,1:4], verbose = FALSE)
summary(em)

Answer: two with 50 and 100 nodes.

Now, we let the EM algorithm find three clusters using the G variable. How well do the clusters agree with the defined species.

em2 <- Mclust(iris[,1:4], G = 3, verbose = FALSE)
summary(em2)

table( em2$classification, iris$Species)

Answer: the first cluster agrees perfectly. The second and third have five flowers identified differently.

## Learning - ANN & SVM

We will look at predicting eruptions. Please see the statistical learning tutorial for a gentle introduction.

### General initialisations

We begin by defining two accuracy measures: root mean squared error (RMSE) and mean absolute percentage error (MAPE). These will be used to compare linear regression, ANN and SVM against each other. Our function derr will display these measures.

#cat('\14'); rm(list=ls());graphics.off(); # for RStudio

rmse <- function(y,yh) {sqrt(mean((y-yh)^2))}
mape <- function(y,yh) {mean(abs((y-yh)/y))}
derr <- function(y,yh) {cat("MAPE:", mape(y, yh)*100,"%, RMSE:", rmse(y, yh))}

We need to do some general data pre-processing to evaluate the test-accuracy of the model.

We will use the faithful dataset (see Wikipedia for some background information) and ?faithful for a data description.

data(faithful); str(faithful)
## 'data.frame':    272 obs. of  2 variables:
##  $eruptions: num 3.6 1.8 3.33 2.28 4.53 ... ##$ waiting  : num  79 54 74 62 85 55 88 85 51 85 ...

Your first task is to divide the data into train and test data. Create sample indices for approximately 70% of the faithful data used for training purposes. For reproducibility initialise the random number generator with seed 7. We will need the test-response repeatedly, hence, save it as $$y$$.

n = nrow(faithful); ns = floor(.7*n);
set.seed(7); I = sample(1:n,ns);

train = faithful[I,]
test  = faithful[-I,]
y  = test$eruptions # will be used frequently ### Linear model (for comparisons) We begin with a quick review of linear regression (see SL-LR for details). flm = lm(eruptions ~ waiting, data = train) yh = predict(flm, newdata = test) ### Test-accuracy derr(y, yh) Visual the models predictions for the entire data set. ggplot(faithful, aes(waiting,eruptions)) + geom_point() + geom_point(data=data.frame(waiting = faithful$waiting,
eruptions = predict(flm, newdata = faithful)), colour="purple")

### ANN model

We will create a neural network with one hidden layer that contains three neurons. The ANN requires the library neuralnet. Other interesting neural network libraries are: nnet, gamlss and RSNNS.

Task, use the above neural network to predict the eruption times for 40, 60, 80 and 100 waiting-time.

library(neuralnet)

ann = neuralnet( eruptions ~ waiting,
data = train,
hidden = 3,
linear.output = TRUE)
### Usage
predict(ann, data.frame(waiting=c(40, 60,80,100)))

That means iw we wait 80 minutes the expected eruption time will be 4.38 minutes.

what is the test-accuracy (MAPE, RMSE) for this model?

yh = predict(ann, newdata = test)
derr(y, yh)

That means, this ANN MAPE is approximately five percent better than the linear regression.

The ANN can be visualised using the common plot command. (Note: only works in RStudio currently)

plot(ann)

You will see the following graph:

The library NeuralNetTools allows another way to plot ANNs using plotnet(ann). There is a blog discussing visualisation of neural networks.

Visual the models predictions for the entire data set.

ggplot(faithful, aes(waiting,eruptions)) + geom_point() +
geom_point(data=data.frame(waiting = faithful$waiting, eruptions = predict(ann, newdata = faithful)), colour="blue") #### ANN model with two hidden layers Now we will create an ANN with two hidden layers. The first hidden layer contains three neurons and the second one four neurons. ann2l = neuralnet( eruptions ~ waiting, data = train, hidden = c(3,4), linear.output = TRUE) yh = predict(ann2l, newdata = test) derr(y, yh) We observe that the algorithm’s running time is longer (it may not even converge). If it converges it’s accuracy is lower than before. plot(ann2l) ### Support Vector Machines (SVM) A support vector machine model with default configuration is created. The library e1071 is required. sm = svm (eruptions ~ waiting, data = train) yh = predict(sm, newdata = test) ### Usage predict(sm, data.frame(waiting=c(40, 60,80,100))) Test-accuracy yh = predict(sm, newdata = test) derr(y, yh) We note that the test-accuracy is better than the one from the ANNs. Visual the models predictions for the entire data set. ggplot(faithful, aes(waiting,eruptions)) + geom_point() + geom_point(data=data.frame(waiting = faithful$waiting,
eruptions = predict(sm, newdata = faithful)), colour="red")

## Problem solving - GA & SA

### Knapsack problem

A knapsack with a weight restriction $$\hat{w}$$ has to be packed with items. Each item $$i$$ has a weight and a value $$v_i$$.

The objective is to maximise the total value by deciding whether to pack or not pack an item $$x_i$$, i.e. $$x_i \in \lbrace 0,1\rbrace$$

$\max_{x} \lbrace\sum_{i=1}^n v_i x_i : \sum_{i=1}^n w_i x_i \leq \hat{w} \rbrace$

By the way, this is the same as: $$\max_{x} \lbrace vx : wx \leq \hat{w} \rbrace$$ in vector notation.

Let us assume $$v = \begin{bmatrix}6& 5& 8& 9& 6& 7& 3\end{bmatrix}$$, $$w = \begin{bmatrix}2&3&6&7&5&9&4\end{bmatrix}$$ and $$\hat{w} = 9$$. What items should be packed?

We begin with defining the data.

v  = c(6, 5, 8, 9, 6, 7, 3) # values
w  = c(2, 3, 6, 7, 5, 9, 4) # weights
wh = 9 # weight limit (including values)
n  = length(v) # number of elements (required later)

#### Warmup

Let $$x$$ be decision vector, which items have to be packed. Define x such that item 1 and 3 are packed. What is the value and weight of packing item 1 and 3 use the dot-product.

x = c(1,0,1,0,0,0,0)
v %*% x # value
w %*% x # weight

### Brute Force

Write a function that finds the optimal solution using a brute force approach. That means all configurations (here, packing options) are tested, and all feasible solutions are evaluated (Note: you may want to do a few preparation exercises first - see below).

# Brute force (maximise value)
knapsackBF <- function(v, w, wh) {
n = length(v); best.v = -Inf;
for (nb in (0:2^7-1)) {
x = as.integer( intToBits(nb) )[1:n]
if (w %*% x <= wh) { # then weight constraint ok
if (v %*% x > best.v) { # then new best value
best.v = v %*% x
sol = x
cat('New solution:', x,">> value:", best.v,"\n")
}
}
}
cat("Best solution value:", best.v, "\nusing x = ", sol, "\n")
return (sol)
}
x = knapsackBF(v, w, wh)

That means item 1 and 4 need to be packed (use which(x==1)). These items have a value of 15 = v %*% x (individual values are 6 and 9 = v[sol==1]) with weights 2 and 7.

#### Practice (preparational exercises)

Convert the number 6 into bits using intToBits() and convert the bits back into integers.

bits = intToBits(6)
as.integer(bits)

### Genetic Algorithm

We will use a Genetic Algorithm with binary chromosome algorithm (rbga.bin) to solve the above Knapsack problem. The algorithm is part of the genalg library.

The algorithm requires an evaluation function, which contains the constraints and objective function. Note that rbga.bin only minimises, i.e. we need to invert the objective to maximise.

Your task is to write the evaluation function evalFun for the Knapsack problem. Assume that $$v$$, $$w$$ and $$\hat{w}$$ are known in the global environment; and $$x$$ is the input. Return $$-v\cdot x$$ if the weight is less than $$\hat{w}$$, otherwise $$+\infty$$. Test the function with $$x = \begin{bmatrix}1&1&0&0&0&0&0\end{bmatrix}$$. Is the solution feasible? If yes, what is the solution value?

evalFun <- function(x) {
if (w %*% x <= wh) { # then weight constraint ok
return ( -(v %*% x) ) # invert value to maximise
} else { return(+Inf)}
}

evalFun(c(1,1,0,0,0,0,0)) # test

Answer: The solution is feasible and the solution value is -11.

Now let us call the GA using most of the default parameter settings. What is the best solution and its value?

myGA <- rbga.bin(evalFunc = evalFun, size = n)

# Best solution
best.v = -min(myGA$best) # best found solution value in each iteration idx = which.min(myGA$evaluations) # index of individual with best value
sol = myGA$population[idx,] # best found solution cat("Best solution value:", best.v, "\nusing x = ", sol, "\n") Answer: the best found solution (in this case it is also the best solution) is $$x = \begin{bmatrix}1&0&0&1&0&0&0\end{bmatrix}$$ with solution value 15. What default parameters are shown using the summary function on myGA? cat(summary(myGA)) Answer: population size, number of generations, elitism and mutation chance. Change the mutation chance (mutationChance) to $$\frac{1}{n}$$, population size (popSize) to 20 and the number of iterations (iters) to 40. Did the algorithm still return the same solution? Did the algorithm find the solution faster? myGA <- rbga.bin(evalFunc = evalFun, size = n, mutationChance = 1/n, popSize = 20, iters = 40) best.v = -min(myGA$best); sol = myGA$population[which.min(myGA$evaluations),]
cat("Best solution value:", best.v, "\nusing x = ", sol, "\n")

Answer: the same solution was found and the algorithm ran faster. Note: results may vary because of different random number generation.

### Simulated Annealing

Simulated Annealing is a meta-heuristic used for optimisation problems. Please explore TSP blog and related Github repository.

### Exact Approaches

The MathProg Solver on Smartana.org includes several examples to get you started with formulating optimisation problems.

Please see: Examples >> Basic Optimisations >> Knapsack to find the mathematical program to solve the Knapsack problem exactly.

## Games - minimax

We will implement the minimax algorithm and integrate $$\alpha$$-$$\beta$$ pruning into it.

Our objective is to create a simple example to test our implementation of the minimax algorithm. We assume a simplified chess game. White has two options to make a move. Similarly, black has two options to respond. At a game-tree depth three all positions are evaluated. The figure below shows the minimax example, we’d like to reproduce.

Each node name in this example represents a position (state).

Functions for the positions, node evaluations and game-over state are hard-coded, but can be easily replaced for other games.

getPositions <- function(pos){
switch (pos,
'w1'=c('b1','b2'),
'b1'= c('w2','w3' ),
'b2'= c('w4','w5' ),
'w2'= c('b3','b4' ),
'w3'= c('b5','b6' ),
'w4'= c('b7','b8' ),
'w5'= c('b9','b10'))
}
cat("Positions resulting from b1:", getPositions('b1'), "\n") # test

gameOver <- function (pos) {
return(FALSE) # always returns FALSE (template code)
}

evaluate <- function(pos) {
switch(pos,
'b3' = -2,
'b4' =  2,
'b5' =  6,
'b6' =  0,
'b7' = -5,
'b8' = -3,
'b9' = -1,
'b10'=  7)
}
cat("Value of position b3 is ", evaluate('b3'), "\n") # test

Next we will implement the minimax algorithm.

minimax <- function(pos='w1', depth = 1, player = TRUE){
if (depth == 0 || gameOver(pos)) {
return (evaluate (pos))
}

if (player) {
eval.max = -Inf;
for (p in getPositions(pos)){
eval.p = minimax(p, depth-1, FALSE)
eval.max = max(eval.max, eval.p)
}
return (eval.max)
} else {
eval.min = +Inf;
for (p in getPositions(pos)){
eval.p = minimax(p, depth-1, TRUE)
eval.min = min(eval.min, eval.p)
}
return (eval.min)
}
}

### Test function
cat("Postion evaluation (depth 3):", minimax(depth = 3),"\n")

### $$\alpha$$-$$\beta$$ pruning

minimaxab <- function(pos='w1', depth = 1, player = TRUE,
alpha = -Inf, beta = +Inf){
if (depth == 0 || gameOver(pos)) {
return (evaluate (pos))
}

if (player) {
eval.max = -Inf;
for (p in getPositions(pos)){
eval.p = minimaxab(p, depth-1, FALSE, alpha, beta)
eval.max = max(eval.max, eval.p)
alpha = max(alpha, eval.p)
if (beta <= alpha) {break}
}
return (eval.max)
} else {
eval.min = +Inf;
for (p in getPositions(pos)){
eval.p = minimaxab(p, depth-1, TRUE, alpha, beta)
eval.min = min(eval.min, eval.p)
}
beta = min( beta, eval.p)
if (beta <= alpha) {break}
return (eval.min)
}
}

### Test function
minimaxab(depth = 3, alpha = -Inf, beta = +Inf)

### Practice makes perfect

Write a recursive function that computes factorial powers facPow for positive integers. (Note this function exists as factorial.) Test the function for $$n=5$$.

$n! = n (n-1)!, ~ 0!=1, n>1$

facPow <- function(n){
if (n==0) return(1) else return ( n * facPow (n-1) )
}
facPow (5)

Next implement the Fibonacci numbers as recursive function (see previous loop tutorial).

## Reasoning under Uncertainty - BBN

Reasoning under Uncertainty with Bayesian Belief Networks (BBN).

An interactive environment can be found using the library BayesianNetwork and the function BayesianNetwork().

The tutorial is based on an example from the bnlearn.com site.

Here, is a quick data description (Ness (2019)):

• Age (A): It is recorded as young (young) for individuals below 30 years, adult (adult) for individuals between 30 and 60 years old, and old (old) for people older than 60.
• Sex (S): The biological sex of individual, recorded as male (M) or female (F).
• Education (E): The highest level of education or training completed by the individual, recorded either high school (high) or university degree (uni).
• Occupation (O): It is recorded as an employee (emp) or a self employed (self) worker.
• Residence (R): The size of the city the individual lives in, recorded as small (small) or big (big).
• Travel (T): The means of transport favoured by the individual, recorded as car (car), train (train) or other (other)

### Defining the network

#library(bnlearn)
dag <- empty.graph(nodes = c("A","S","E","O","R","T"))
arc.set <- matrix(c("A", "E",
"S", "E",
"E", "O",
"E", "R",
"O", "T",
"R", "T"),
byrow = TRUE, ncol = 2,
dimnames = list(NULL, c("from", "to")))
arcs(dag) <- arc.set
# nodes(dag)
plot(dag)

### States and Probabilities

We defined the states and probabilities and display all conditional probability tables (CPTs).

A.lv <- c("young", "adult", "old")
S.lv <- c("M", "F")
E.lv <- c("high", "uni")
O.lv <- c("emp", "self")
R.lv <- c("small", "big")
T.lv <- c("car", "train", "other")

A.prob <- array(c(0.3,0.5,0.2), dim = 3, dimnames = list(A = A.lv))
S.prob <- array(c(0.6,0.4), dim = 2, dimnames = list(S = S.lv))
E.prob <- array(c(0.75,0.25,0.72,0.28,0.88,0.12,0.64,0.36,0.70,0.30,0.90,0.10), dim = c(2,3,2), dimnames = list(E = E.lv, A = A.lv, S = S.lv))
O.prob <- array(c(0.96,0.04,0.92,0.08), dim = c(2,2), dimnames = list(O = O.lv, E = E.lv))
R.prob <- array(c(0.25,0.75,0.2,0.8), dim = c(2,2), dimnames = list(R = R.lv, E = E.lv))
T.prob <- array(c(0.48,0.42,0.10,0.56,0.36,0.08,0.58,0.24,0.18,0.70,0.21,0.09), dim = c(3,2,2), dimnames = list(T = T.lv, O = O.lv, R = R.lv))
cpt <- list(A = A.prob, S = S.prob, E = E.prob, O = O.prob, R = R.prob, T = T.prob)
cpt

### Applying the BBN

survey = custom.fit(dag, cpt)
#rbn(survey, n = 10)
newdata = data.frame(A = factor("young", levels = A.lv),
S = factor("F", levels = S.lv),
E = factor("uni", levels = E.lv),
O = factor("self", levels = O.lv),
R = factor("big", levels = R.lv))

predict(survey, node = "T", data = newdata)
cpquery(survey, event = (S == "M") & (T == "car"), evidence = (E == "high"))

## Resources

• An excellent introduction to learning methods is James et al. (2013). This book can also be found online.
• A classic introduction into the field of AI is Russell and Norvig (2020) book.

### Acknowledgment

This tutorial was created using RStudio, R, rmarkdown, and many other tools and libraries. The packages learnr and gradethis were particularly useful. I’m very grateful to Prof. Andy Field for sharing his disovr package, which allowed me to improve the style of this tutorial and get more familiar with learnr. Allison Horst wrote a very instructive blog “Teach R with learnr: a powerful tool for remote teaching”, which encouraged me to continue with learnr. By the way, I find her statistic illustrations amazing.

## References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Vol. 112. Springer.

Russell, Stuart, and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach (Pearson Series in Artifical Intelligence). London, England, UK: Pearson. https://www.amazon.com/Artificial-Intelligence-A-Modern-Approach/dp/0134610997.