I know there are many R users who like to test out SparkR without all the configuration hassle. Just these six lines and you can start SparkR from both RStudio and command line.
Apache Spark is a fast and general-purpose cluster computing system
SparkR is an R package that provides a light-weight frontend to use Apache Spark from R
The first three lines should be called in your command line.
1 |
|
You can already start SparkR shell by typing this in your command line;
SparkR
1 |
|
spark_path <- strsplit(system(“brew info apache-spark”,intern=T)[4],’ ‘)[[1]][1] # Get your spark path .libPaths(c(file.path(spark_path,”libexec”, “R”, “lib”), .libPaths())) # Navigate to SparkR folder library(SparkR) # Load the library
1 |
|
sc <- sparkR.init() sqlContext <- sparkRSQL.init(sc) df <- createDataFrame(sqlContext, iris) head(df)
1 |
|
Enjoy!
The full codes are available from here.