An Introduction to Loops in R –
First, if you are new to programming, you should know that loops are away to tell the computer that you want to repeat some operation for anumber of times. This is a very common task that can be found in manyprogramming languages. For example, let’s say you invited five friendsfor dinner at your home and the whole cost of four pizzas will be splitevenly. Assume now that you must give instructions to a computer oncalculating how much each one will pay at the end of dinner. For that,you need to sum up the individual tabs and divide by the number ofpeople. Your instructions to the computer could be: start with a valueof x=zero, take each individual pizza cost and sum it to x until allcosts are processed, dividing the result by the number of friends at theend.
The great thing about loops is that the length of it is dynamicallyset. Using the previous example, if we had 500 friends (and a largedinner table!), we could use the same instructions for calculating theindividual tabs. That means we can encapsulate a generic procedure forprocessing any given number of friends at dinner. With it, you have atyour reach a tool for the execution of any sequential process. In otherwords, you are the boss of your computer and, as long as you can writeit down clearly, you can set it to do any kind of repeated task for you.
Now, about the code, we could write the solution to the pizza problemin R as:
1 |
|
Don’t worry if you didn’t understand the code. We’ll get to thestructure of a loop soon.
Back to our case, each friend would pay 44 for the meal. We can checkthe result against function sum
:
1 |
|
The output TRUE
shows that the results are equal.
The Structure of a Loop
Knowing how to use loops can be a powerful ally in a complex datarelated problem. Let’s talk more about how loops are defined in R. Thestructure of a loop in R follows:
1 |
|
In the previous code, command for
indicates the beginning of a loop.Object i
in (i in i.vec)
is the iterator of the loop. Thisiterator will change its value in each iteration, taking each individualvalue contained in i.vec
. Note the loop is encapsulated by curlybraces ({}
). These are important, as they define where the loopstarts and where it ends. The indentation (use of bigger margins) isalso important for visual cues, but not necessary. Consider thefollowing practical example:
1 |
|
In the code, we created a sequence from -5 to 5 and presented a text foreach element with the cat
function. Notice how we also broke theprompt line with '\n'
. The loop starts with i=-5
, execute commandcat(paste('\nThe value of i is', -5))
, proceed to the next iterationby setting i=-4
, rerun the cat
command, and so on. At its finaliteration, the value of i
is 5
.
The iterated sequence in the loop is not exclusive to numericalvectors. Any type of vector or list may be used. See next:
1 |
|
The same goes for lists
:
1 |
|
In the definition of loops, the iterator does not have to be the onlyobject incremented in each iteration. We can create other objects andincrement them using a simple sum operation. See next:
1 |
|
Using nested loops, that is, a loop inside of another loop is alsopossible. See the following example, where we present all the elementsof a matrix:
1 |
|
A Real World Example
Now, the computational needs of the real world is far more complex thandividing a dinner expense. A practical example of using loops isprocessing data according to groups. Using an example from Finance, ifwe have a return dataset for several stocks and we want to calculate theaverage return of each stock, we can use a loop for that. In thisexample, we will use Yahoo Finance data from three stocks: FB, GE andAA. The first step is downloading it with package BatchGetSymbols
.
1 |
|
It worked fine. Let’s check the contents of the dataframe:
1 |
|
All financial data is there. Notice that the return series is availableat column ret.adjusted.prices.
Now we will use a loop to build a table with the mean return of eachstock:
1 |
|
In the code, we used function unique
to find out the names of all thetickers in the dataset. Soon after, we create an empty dataframe tosave the results and a loop to filter the data of each stocksequentially and average its returns. At the end of the loop, we usefunction rbind
to paste the results of each stock with the results ofthe main table. As you can see, we can use the data to perform groupcalculations with loop.
By now, I must be forward in saying that the previous loop is by nomeans the best way of performing the data operation. What we just did byloops is called a split-apply-combine procedure. There are basefunction in R such as tapply
, split
and lapply
/sapply
that cando the same job but with a more intuitive and functional approach. Goingfurther, functions from package tidyverse
can do the same procuedurewith an even more intuitive approach. In a future post I shall discussthis possibilities further.
I hope you guys liked the post. Got a question? Just drop it at thecomment section.
Related