Six lines to install and start SparkR on Mac OS X Yosemite

I know there are many R users who like to test out SparkR without all the configuration hassle. Just these six lines and you can start SparkR from both RStudio and command line.

Apache Spark is a fast and general-purpose cluster computing system

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R

The first three lines should be called in your command line.

1
2
3
brew update # If you don't have homebrew, get it from here (http://brew.sh/)
brew install hadoop # Install Hadoop
brew install apache-spark # Install Spark

You can already start SparkR shell by typing this in your command line;

SparkR

1
2
3

If you like to call it from RStudio, execute the rest in R

spark_path <- strsplit(system(“brew info apache-spark”,intern=T)[4],’ ‘)[[1]][1] # Get your spark path .libPaths(c(file.path(spark_path,”libexec”, “R”, “lib”), .libPaths())) # Navigate to SparkR folder library(SparkR) # Load the library

1
2
3

That’s all.Now this should run in your RStudio

sc <- sparkR.init() sqlContext <- sparkRSQL.init(sc) df <- createDataFrame(sqlContext, iris) head(df)

1
2
3
4
5
6
7
# Sepal_Length Sepal_Width Petal_Length Petal_Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa

Enjoy!

The full codes are available from here.