Loops and Pizzas

An Introduction to Loops in R –

First, if you are new to programming, you should know that loops are away to tell the computer that you want to repeat some operation for anumber of times. This is a very common task that can be found in manyprogramming languages. For example, let’s say you invited five friendsfor dinner at your home and the whole cost of four pizzas will be splitevenly. Assume now that you must give instructions to a computer oncalculating how much each one will pay at the end of dinner. For that,you need to sum up the individual tabs and divide by the number ofpeople. Your instructions to the computer could be: start with a valueof x=zero, take each individual pizza cost and sum it to x until allcosts are processed, dividing the result by the number of friends at theend.

The great thing about loops is that the length of it is dynamicallyset. Using the previous example, if we had 500 friends (and a largedinner table!), we could use the same instructions for calculating theindividual tabs. That means we can encapsulate a generic procedure forprocessing any given number of friends at dinner. With it, you have atyour reach a tool for the execution of any sequential process. In otherwords, you are the boss of your computer and, as long as you can writeit down clearly, you can set it to do any kind of repeated task for you.

Now, about the code, we could write the solution to the pizza problemin R as:

pizza.costs <- c(50, 80, 30, 60) # each cost of pizza
n.friends <- 5 # number of friends

x <- 0 # set first cost to zero
for (i.cost in pizza.costs) {
 x <- x + i.cost # sum it up
}

x <- x/n.friends # divide for average per friend
print(x)

## [1] 44

Don’t worry if you didn’t understand the code. We’ll get to thestructure of a loop soon.

Back to our case, each friend would pay 44 for the meal. We can checkthe result against function sum:

x == sum(pizza.costs)/n.friends

## [1] TRUE

The output TRUE shows that the results are equal.

The Structure of a Loop

Knowing how to use loops can be a powerful ally in a complex datarelated problem. Let’s talk more about how loops are defined in R. Thestructure of a loop in R follows:

for (i in i.vec){
 ...
}

In the previous code, command for indicates the beginning of a loop.Object i in (i in i.vec) is the iterator of the loop. Thisiterator will change its value in each iteration, taking each individualvalue contained in i.vec. Note the loop is encapsulated by curlybraces ({}). These are important, as they define where the loopstarts and where it ends. The indentation (use of bigger margins) isalso important for visual cues, but not necessary. Consider thefollowing practical example:

# set seq
my.seq <- seq(-5,5)

# do loop
for (i in my.seq){
 cat(paste('\nThe value of i is',i))
}

## 
## The value of i is -5
## The value of i is -4
## The value of i is -3
## The value of i is -2
## The value of i is -1
## The value of i is 0
## The value of i is 1
## The value of i is 2
## The value of i is 3
## The value of i is 4
## The value of i is 5

In the code, we created a sequence from -5 to 5 and presented a text foreach element with the cat function. Notice how we also broke theprompt line with '\n'. The loop starts with i=-5, execute commandcat(paste('\nThe value of i is', -5)), proceed to the next iterationby setting i=-4, rerun the cat command, and so on. At its finaliteration, the value of i is 5.

The iterated sequence in the loop is not exclusive to numericalvectors. Any type of vector or list may be used. See next:

# set char vec
my.char.vec <- letters[1:5]

# loop it!
for (i.char in my.char.vec){
 cat(paste('\nThe value of i.char is', i.char))
}

## 
## The value of i.char is a
## The value of i.char is b
## The value of i.char is c
## The value of i.char is d
## The value of i.char is e

The same goes for lists:

# set list
my.l <- list(x = 1:5, 
 y = c('abc','dfg'), 
 z = factor('A','B','C','D'))

# loop list
for (i.l in my.l){
 
 cat(paste0('\nThe class of i.l is ', class(i.l), '. '))
 cat(paste0('The number of elements is ', length(i.l), '.'))
 
}

## 
## The class of i.l is integer. The number of elements is 5.
## The class of i.l is character. The number of elements is 2.
## The class of i.l is factor. The number of elements is 1.

In the definition of loops, the iterator does not have to be the onlyobject incremented in each iteration. We can create other objects andincrement them using a simple sum operation. See next:

# set vec and iterators
my.vec <- seq(1:5)
my.x <- 5
my.z <- 10

for (i in my.vec){
 # iterate "manually"
 my.x <- my.x + 1
 my.z <- my.z + 2
 
 cat('\nValue of i = ', i, 
 ' | Value of my.x = ', my.x, 
 ' | Value of my.z = ', my.z)
}

## 
## Value of i = 1 | Value of my.x = 6 | Value of my.z = 12
## Value of i = 2 | Value of my.x = 7 | Value of my.z = 14
## Value of i = 3 | Value of my.x = 8 | Value of my.z = 16
## Value of i = 4 | Value of my.x = 9 | Value of my.z = 18
## Value of i = 5 | Value of my.x = 10 | Value of my.z = 20

Using nested loops, that is, a loop inside of another loop is alsopossible. See the following example, where we present all the elementsof a matrix:

# set matrix
my.mat <- matrix(1:9, nrow = 3)

# loop all values of matrix
for (i in seq(1,nrow(my.mat))){
 for (j in seq(1,ncol(my.mat))){
 cat(paste0('\nElement [', i, ', ', j, '] = ', my.mat[i,j]))
 }
}

## 
## Element [1, 1] = 1
## Element [1, 2] = 4
## Element [1, 3] = 7
## Element [2, 1] = 2
## Element [2, 2] = 5
## Element [2, 3] = 8
## Element [3, 1] = 3
## Element [3, 2] = 6
## Element [3, 3] = 9

A Real World Example

Now, the computational needs of the real world is far more complex thandividing a dinner expense. A practical example of using loops isprocessing data according to groups. Using an example from Finance, ifwe have a return dataset for several stocks and we want to calculate theaverage return of each stock, we can use a loop for that. In thisexample, we will use Yahoo Finance data from three stocks: FB, GE andAA. The first step is downloading it with package BatchGetSymbols.

library(BatchGetSymbols)

## Loading required package: rvest

## Loading required package: xml2

## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
## filter, lag

## The following objects are masked from 'package:base':
## 
## intersect, setdiff, setequal, union

## 

my.tickers <- c('FB', 'GE', 'AA')

df.stocks <- BatchGetSymbols(tickers = my.tickers, 
 first.date = '2012-01-01', 
 freq.data = 'yearly')[[2]]

## 
## Running BatchGetSymbols for:
## tickers = FB, GE, AA
## Downloading data for benchmark ticker | Found cache file
## FB | yahoo (1|3) | Found cache file - Good job!
## GE | yahoo (2|3) | Found cache file - Nice!
## AA | yahoo (3|3) | Found cache file - You got it!

It worked fine. Let’s check the contents of the dataframe:

dplyr::glimpse(df.stocks)

## Observations: 21
## Variables: 10
## $ ticker "AA", "AA", "AA", "AA", "AA", "AA", "AA", ...
## $ ref.date 2012-01-03, 2013-01-02, 2014-01-02, 2015-...
## $ volume 2217410500, 2149575500, 2146821400, 268355...
## $ price.open 21.48282, 21.33864, 25.30359, 38.13561, 22...
## $ price.high 25.85628, 25.68807, 42.29280, 41.01921, 32...
## $ price.low 19.27206, 18.50310, 24.27030, 18.79146, 16...
## $ price.close 22.17969, 21.60297, 25.30359, 38.15964, 23...
## $ price.adjusted 20.89342, 20.62187, 24.48568, 37.24207, 23...
## $ ret.adjusted.prices NA, -0.01299715, 0.18736494, 0.52097326, -...
## $ ret.closing.prices NA, -0.02600212, 0.17130149, 0.50807215, -...

All financial data is there. Notice that the return series is availableat column ret.adjusted.prices.

Now we will use a loop to build a table with the mean return of eachstock:

# find unique tickers in column ticker
unique.tickers <- unique(df.stocks$ticker)

# create empty df
tab.out <- data.frame()

# loop tickers
for (i.ticker in unique.tickers){
 
 # create temp df with ticker i.ticker
 temp <- df.stocks[df.stocks$ticker==i.ticker, ]
 
 # row bind i.ticker and mean.ret
 tab.out <- rbind(tab.out, 
 data.frame(ticker = i.ticker,
 mean.ret = mean(temp$ret.adjusted.prices, na.rm = TRUE)))
 
}

# print result
print(tab.out)

## ticker mean.ret
## 1 AA 0.24663684
## 2 FB 0.35315566
## 3 GE 0.06784693

In the code, we used function unique to find out the names of all thetickers in the dataset. Soon after, we create an empty dataframe tosave the results and a loop to filter the data of each stocksequentially and average its returns. At the end of the loop, we usefunction rbind to paste the results of each stock with the results ofthe main table. As you can see, we can use the data to perform groupcalculations with loop.

By now, I must be forward in saying that the previous loop is by nomeans the best way of performing the data operation. What we just did byloops is called a split-apply-combine procedure. There are basefunction in R such as tapply, split and lapply/sapply that cando the same job but with a more intuitive and functional approach. Goingfurther, functions from package tidyverse can do the same procuedurewith an even more intuitive approach. In a future post I shall discussthis possibilities further.

I hope you guys liked the post. Got a question? Just drop it at thecomment section.

Related