Welcome to your second Data Analytics tutorial.

Please be sure you have done the datatypes tutorial.

We are continuing the adventure by learning essential survial skills.

First we need to know how to decide. Your life may depend on it. So, you need to be able to execute `if`

in danger do the right thing `then`

I will be fine `else`

there is trouble. Once this is mastered we need to learn how to *keep on going*. The `for and while loops`

will help us with that.

Change the boolean variable to `FALSE`

and you should be in trouble.

```
doing_the_right_thing = TRUE
if (doing_the_right_thing == TRUE ){
cat('I will be fine.\n')
} else {
cat('I am in trouble.\n')
}
```

Warning: `} else`

must be on the same line.

Create a boolean variable `rain`

and set it to false. Write a `if`

clause that displays: “There is a risk of flooding.” or “everything is fine” (don’t forget `\n`

).

```
rain = FALSE
if (rain == TRUE ){
cat('There is a risk of flooding.\n')
} else {
cat('everything is fine\n')
}
```

Print the numbers 1 to 50 on the console.

```
for (nb in (1:50)){
cat(nb,", ", sep = "")
}
```

Assume you have several machine learning algorithms `rpart`

, `knn`

, `neuralnet`

, `randomForest`

, `lm`

and `xgbTree`

. Iterate through these algorithms.

```
algos = c('rpart', 'knn', 'neuralnet', 'randomForest', 'lm','xgbTree')
for (alg in algos){
cat(alg,", ", sep = "")
}
```

Now display the iteration number as well.

```
algos = c('rpart', 'knn', 'neuralnet', 'randomForest', 'lm','xgbTree')
nb = 1
for (alg in algos){
cat(nb,". ", alg,"\n", sep = "")
nb <- nb+1
}
```

This will come in “handy” when comparing algorithms.

Loops are often used for aggregation purposes. For instance, you can compute the sum or product of multiple numbers.

Write a for loop that computes the cumulative product using a for loop. That means, multiply the number \(e \in \{2, 4, \dots, 10\}\) and output as variable \(s\)

```
s = 1
for (e in 2*(1:5)){
s = s*e
}
s
```

In order to decipher a code you will need seven Fibonacci numbers, where the first one is greater than two.

\[ F_1=1,~F_2=1,~F_n=F_{n-1}+F_{n-2} \]

```
F[1]=1; F[2]=1;
for (n in 3:10){
F[n] = F[n-1] + F[n-2]
}
F[F>2]
```

```
energy = 10
while (energy > 5){
cat('Energy level ', energy, '\n')
energy = energy - 2
}
```

There are many built in functions already. What are typical base functions? For instance, `sum()`

, `mean()`

, `min()`

,`max()`

(run `library(help = "base")`

for a comprehensive list).

Let us re-implement `sum`

as a script using a `for`

loop.

```
x = c(1,2,3,4) # given vector
s = 0 # our summation variable
for (k in 1:length(x)){
s = s + x[k] # add element to sum
}
s # display result
```

Now let us use use the above as function and name it: `mySum()`

```
# define a function
mySum <- function(x){
s = 0
for (k in 1:length(x)){
s = s + x[k]
}
return(s)
}
# use the function
x = c(1,2,3,4) # given vector
mySum(x) # call the function
```

Let us reflect on the above function. `mySum`

is the function name. Then we assign `<-`

a function block `function(x){}`

, where `x`

is an input variable. The function `return()`

returns the results from within the function’s body to the environment (workspace) in which it was called.

Write the function `myMin`

which determines the minimum. Then write `x=c(65,70,24,26,36,65,83,34,42,34)`

and return the minimum using your function.

```
myMin <- function(x){
m = Inf # special means m is at infinity
for (k in 1:length(x)){
if (x[k]<m) m = x[k];
}
return(m)
}
# use function
x=c(65,70,24,26,36,65,83,34,42,34)
myMin(x)
```

What if we have several input variables? Let us have a function that multiplies three numbers.

```
multi3 <- function(a,b,c){ return(a*b*c)}
multi3 (2,3,4)
```

What if we have several output variables? Let us return the variables a,b and c.

```
ret3 <- function(){
L = null; # initialise empty list
L$a = 1; L$b = 2; L$c = 3;
return(L)
}
ret3()
```

What to do - if you have written many functions, which you will use several times in your project?

Easiest way is to collect them in one r-file and then use the `source`

command to load them.

What to do - if you have written many functions, which you will use several times in many projects?

In this case it pays-off to create a `package`

.

- Another Data Analytics tutorial “Data Analytics Tutorial for Beginners - From Beginner to Pro in 10 Mins! - DataFlair” (2019)
- Brauer (2020) is a very short introduction to R
- Field (2021) is a great book to discover statistics using R
- Shah (2020) is a hands-on introduction to data science (Chapter 6 explains R)

This tutorial was created using RStudio, R, rmarkdown, and many other tools and libraries. The packages `learnr`

and `gradethis`

were particularly useful. I’m very grateful to Prof. Andy Field for sharing his disovr package, which allowed me to improve the style of this tutorial and get more familiar with `learnr`

. Allison Horst wrote a very instructive blog “Teach R with learnr: a powerful tool for remote teaching”, which encouraged me to continue with `learnr`

. By the way, I find her statistic illustrations amazing.

Brauer, Claudia. 2020. “A-very-short-introduction-to-R.” *GitHub*. https://github.com/ClaudiaBrauer/A-very-short-introduction-to-R/blob/master/documents/A%20(very)%20short%20introduction%20to%20R.pdf.

“Data Analytics Tutorial for Beginners - From Beginner to Pro in 10 Mins! - DataFlair.” 2019. *DataFlair*. https://data-flair.training/blogs/data-analytics-tutorial.

Field, Andy P. 2021. *Discovering Statistics Using R and RStudio*. Second. London: Sage.

Shah, Chirag. 2020. *A Hands-on Introduction to Data Science*. Cambridge University Press.