Six lines to install and start SparkR on Mac OS X Yosemite

I know there are many R users who like to test out SparkR without all the configuration hassle. Just these six lines and you can start SparkR from both RStudio and command line.


One line for Spark and SparkR

Apache Spark is a fast and general-purpose cluster computing system

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R

Six lines to start SparkR

The first three lines should be called in your command line.

brew update # If you don't have homebrew, get it from here (
brew install hadoop # Install Hadoop
brew install apache-spark # Install Spark

You can already start SparkR shell by typing this in your command line;


If you like to call it from RStudio, execute the rest in R

spark_path <- strsplit(system("brew info apache-spark",intern=T)[4],' ')[[1]][1] # Get your spark path
.libPaths(c(file.path(spark_path,"libexec", "R", "lib"), .libPaths())) # Navigate to SparkR folder
library(SparkR) # Load the library

That’s all.
Now this should run in your RStudio

sc <- sparkR.init()
sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, iris) 
# Sepal_Length Sepal_Width Petal_Length Petal_Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa



The full codes are available from here.

comments powered by Disqus