Objects types and some useful R functions for beginners

All objects in R have a given type. You already know most of them, as these types are also usedin mathematics. Integers, floating point numbers, or floats, matrices, etc, are all objects youare already familiar with. But R has other, maybe lesser known data types (that you can find in alot of other programming languages) that you need to become familiar with. But first, we need tolearn how to assign a value to a variable. This can be done in two ways:

1
1
a <- 3

or

1
a = 3

in very practical terms, there is no difference between the two. I prefer using <- for assigningvalues to variables and reserve = for passing arguments to functions, for example:

1
spam <- mean(x = c(1,2,3))

I think this is less confusing than:

1
spam = mean(x = c(1,2,3))

but as I explained above you can use whatever you feel most comfortable with.

The numeric class

To define single numbers, you can do the following:

1
1
a <- 3

The class() function allows you to check the class of an object:

1
1
1
1
class(a)
1
1
1
1
1
## [1] "numeric"

Decimals are defined with the character .:

1
a <- 3.14

R also supports integers. If you find yourself in a situation where you explicitly need an integerand not a floating point number, you can use the following:

1
2
a <- as.integer(3)
class(a)
1
## [1] "integer"

The as.integer() function is very useful, because it converts its argument into an integer. Thereis a whole family of as.*() functions. To convert a into a floating point number again:

1
1
class(as.numeric(a))
1
1
1
1
1
## [1] "numeric"

There is also is.numeric() which tests whether a number is of the numeric class:

1
is.numeric(a)
1
1
1
1
1
## [1] TRUE

These functions are very useful, there is one for any of the supported types in R. Later, we are goingto learn about the {purrr} package, which is a very powerful package for functional programming. Thispackage includes further such functions.

The character class

Use " " to define characters (called strings in other programming languages):

1
a <- "this is a string"
1
1
1
1
class(a)
1
1
1
1
## [1] "character"

To convert something to a character you can use the as.character() function:

1
2
3
a <- 4.392

class(a)
1
1
1
1
1
## [1] "numeric"
1
class(as.character(a))
1
1
1
1
## [1] "character"

It is also possible to convert a character to a numeric:

1
2
3
a <- "4.392"

class(a)
1
1
1
1
## [1] "character"
1
1
class(as.numeric(a))
1
1
1
1
1
## [1] "numeric"

But this only works if it makes sense:

1
2
3
a <- "this won't work, chief"

class(a)
1
1
1
1
## [1] "character"
1
as.numeric(a)
1
## Warning: NAs introduced by coercion
1
## [1] NA

A very nice package to work with characters is {stringr}, which is also part of the {tidyverse}.

The factor class

Factors look like characters, but are very different. They are the representation of categoricalvariables. A {tidyverse} package to work with factors is {forcats}. You would rarely usefactor variables outside of datasets, so for now, it is enough to know that this class exists.We are going to learn more about factor variables in Chapter 4, by using the {forcats} package.

The Date class

Dates also look like characters, but are very different too:

1
as.Date("2019/03/19")
1
## [1] "2019-03-19"
1
class(as.Date("2019/03/19"))
1
## [1] "Date"

Manipulating dates and time can be tricky, but thankfully there’s a {tidyverse} package for that,called {lubridate}. We are going to go over this package in Chapter 4.

The logical class

This class is the result of logical comparisons, for example, if you type:

1
4 > 3
1
1
1
1
1
## [1] TRUE

R returns TRUE, which is an object of class logical:

1
2
k <- 4 > 3
class(k)
1
## [1] "logical"

In other programming languages, logicals are often called bools. A logical variable can only havetwo values, either TRUE or FALSE. You can test the truthiness of a variable with isTRUE():

1
2
k <- 4 > 3
isTRUE(k)
1
1
1
1
1
## [1] TRUE

How can you test if a variable is false? There is not a isFALSE() function (at least not without havingto load a package containing this function), but there is way to do it:

1
2
k <- 4 > 3
!isTRUE(k)
1
1
1
1
## [1] FALSE

The ! operator indicates negation, so the above expression could be translated as is k not TRUE?.There are other such operators, namely &, &&, |, ||. & means and and | stands for or.You might be wondering what the difference between & and && is? Or between | and ||? & and| work on vectors, doing pairwise comparisons:

1
2
3
one <- c(TRUE, FALSE, TRUE, FALSE)
two <- c(FALSE, TRUE, TRUE, TRUE)
one & two
1
## [1] FALSE FALSE TRUE FALSE

Compare this to the && operator:

1
2
3
one <- c(TRUE, FALSE, TRUE, FALSE)
two <- c(FALSE, TRUE, TRUE, TRUE)
one && two
1
1
1
1
## [1] FALSE

The && and || operators only compare the first element of the vectors and stop as soon as a the returnvalue can be safely determined. This is called short-circuiting. Consider the following:

1
2
3
4
one <- c(TRUE, FALSE, TRUE, FALSE)
two <- c(FALSE, TRUE, TRUE, TRUE)
three <- c(TRUE, TRUE, FALSE, FALSE)
one && two && three
1
1
1
1
## [1] FALSE
1
one || two || three
1
1
1
1
1
## [1] TRUE

The || operator stops as soon it evaluates to TRUE whereas the && stops as soon as it evaluates to FALSE.Personally, I rarely use || or && because I get confused. I find using | or & in combination with theall() or any() functions much more useful:

1
2
3
one <- c(TRUE, FALSE, TRUE, FALSE)
two <- c(FALSE, TRUE, TRUE, TRUE)
any(one & two)
1
1
1
1
1
## [1] TRUE
1
all(one & two)
1
1
1
1
## [1] FALSE

any() checks whether any of the vector’s elements are TRUE and all() checks if all elements of the vector areTRUE.

As a final note, you should know that is possible to use T for TRUE and F for FALSE but I would advise againstdoing this, because it is not very explicit.

Vectors and matrices

You can create a vector in different ways. But first of all, it is important to understand that avector in most programming languages is nothing more than a list of things. These things can benumbers (either integers or floats), strings, or even other vectors. A vector in R can only contain elements of onesingle type. This is not the case for a list, which is much more flexible. We will talk about lists shortly, butlet’s first focus on vectors and matrices.

The c() function

A very important function that allows you to build a vector is c():

1
a <- c(1,2,3,4,5)

This creates a vector with elements 1, 2, 3, 4, 5. If you check its class:

1
1
1
1
class(a)
1
1
1
1
1
## [1] "numeric"

This can be confusing: you where probably expecting a to be of class vector orsomething similar. This is not the case if you use c() to create the vector, because c()doesn’t build a vector in the mathematical sense, but a so-called atomic vector.Checking its dimension:

1
1
dim(a)
1
## NULL

returns NULL because an atomic vector doesn’t have a dimension.If you want to create a true vector, you need to use cbind() or rbind().

But before continuing, be aware that atomic vectors can only contain elements of the same type:

1
c(1, 2, "3")
1
## [1] "1" "2" "3"

because “3” is a character, all the other values get implicitly converted to characters. You haveto be very careful about this, and if you use atomic vectors in your programming, you have to makeabsolutely sure that no characters or logicals or whatever else are going to convert your atomicvector to something you were not expecting.

cbind() and rbind()

You can create a true vector with cbind():

1
a <- cbind(1, 2, 3, 4, 5)

Check its class now:

1
1
1
1
class(a)
1
## [1] "matrix"

This is exactly what we expected. Let’s check its dimension:

1
1
dim(a)
1
## [1] 1 5

This returns the dimension of a using the LICO notation (number of LInes first, the number of COlumns).

It is also possible to bind vectors together to create a matrix.

1
1
b <- cbind(6,7,8,9,10)

Now let’s put vector a and b into a matrix called matrix_c using rbind().rbind() functions the same way as cbind() but glues the vectors together by rows and not by columns.

1
2
matrix_c <- rbind(a,b)
print(matrix_c)
1
2
3
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 6 7 8 9 10

The matrix class

R also has support for matrices. For example, you can create a matrix of dimension (5,5) filledwith 0’s with the matrix() function:

1
matrix_a <- matrix(0, nrow = 5, ncol = 5)

If you want to create the following matrix:

[B = \left(\begin{array}{ccc}2 & 4 & 3 \1 & 5 & 7\end{array} \right)]

you would do it like this:

1
B <- matrix(c(2, 4, 3, 1, 5, 7), nrow = 2, byrow = TRUE)

The option byrow = TRUE means that the rows of the matrix will be filled first.

You can access individual elements of matrix_a like so:

1
matrix_a[2, 3]
1
## [1] 0

and R returns its value, 0. We can assign a new value to this element if we want. Try:

1
matrix_a[2, 3] <- 7

and now take a look at matrix_a again.

1
print(matrix_a)
1
2
3
4
5
6
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 0 0 0 0
## [2,] 0 0 7 0 0
## [3,] 0 0 0 0 0
## [4,] 0 0 0 0 0
## [5,] 0 0 0 0 0

Recall our vector b:

1
1
b <- cbind(6,7,8,9,10)

To access its third element, you can simply write:

1
b[3]
1
## [1] 8

I have heard many people praising R for being a matrix based language. Matrices are indeed useful,and statisticians are used to working with them. However, I very rarely use matrices in myday to day work, and prefer an approach based on data frames (which will be discussed below). Thisis because working with data frames makes it easier to use R’s advanced functional programminglanguage capabilities, and this is where R really shines in my opinion. Working with matricesalmost automatically implies using loops and all the iterative programming techniques, à la Fortran,which I personally believe are ill-suited for interactive statistical programming (as discussed inthe introduction).

The list class

The list class is a very flexible class, and thus, very useful. You can put anything inside a list,such as numbers:

1
list1 <- list(3, 2)

or other lists constructed with c():

1
list2 <- list(c(1, 2), c(3, 4))

you can also put objects of different classes in the same list:

1
list3 <- list(3, c(1, 2), "lists are amazing!")

and of course create list of lists:

1
my_lists <- list(list1, list2, list3)

To check the contents of a list, you can use the structure function str():

1
str(my_lists)
1
2
3
4
5
6
7
8
9
10
11
## List of 3
## $ :List of 2
## ..$ : num 3
## ..$ : num 2
## $ :List of 2
## ..$ : num [1:2] 1 2
## ..$ : num [1:2] 3 4
## $ :List of 3
## ..$ : num 3
## ..$ : num [1:2] 1 2
## ..$ : chr "lists are amazing!"

or you can use RStudio’s Environment pane:

You can also create named lists:

1
list4 <- list("a" = 2, "b" = 8, "c" = "this is a named list")

and you can access the elements in two ways:

1
list4[[1]]
1
## [1] 2

or, for named lists:

1
list4$c
1
## [1] "this is a named list"

Lists are used extensively because they are so flexible. You can build lists of datasets and applyfunctions to all the datasets at once, build lists of models, lists of plots, etc… In the laterchapters we are going to learn all about them. Lists are central objects in a functional programmingworkflow for interactive statistical analysis.

The data.frame and tibble classes

In the next chapter we are going to learn how to import datasets into R. Once you import data, theresulting object is either a data.frame or a tibble depending on which package you used toimport the data. tibbles extend data.frames so if you know about data.frame objects already,working with tibbles will be very easy. tibbles have a better print() method, and some otherniceties.

However, I want to stress that these objects are central to R and are thus very important; they areactually special cases of lists, discussed above. There are different ways to print a data.frame ora tibble if you wish to inspect it. You can use View(my_data) to show the my_data data.framein the View pane of RStudio:

You can also use the str() function:

1
str(my_data)

And if you need to access an individual column, you can use the $ sign, same as for a list:

1
my_data$col1

Formulas

We will learn more about formulas later, but because it is an important object, it is useful if youalready know about them early on. A formula is defined in the following way:

1
2
3
my_formula <- ~x

class(my_formula)
1
## [1] "formula"

Formula objects are defined using the ~ symbol. Formulas are useful to define statistical models,for example for a linear regression:

1
lm(y ~ x)

or also to define anonymous functions, but more on this later.