Install Rstudio server on debian 11¶
In this tutorial, we will install Rstudio server on debian 11.
Install the Rstudio deb¶
You can find all the available version in ths page: https://posit.co/download/rstudio-server/
# install r
sudo apt-get install r-base
# gdebi-core lets you install local deb packages resolving and installing its dependencies
sudo apt-get install gdebi-core
# some dependency packages
sudo apt-get -y install libcurl4-gnutls-dev
sudo apt-get -y install libssl-dev
# get the rstudio-server deb file
wget https://download2.rstudio.org/server/jammy/amd64/rstudio-server-2023.09.1-494-amd64.deb
# install the deb file
sudo gdebi rstudio-server-2023.09.1-494-amd64.deb
# you should see the rstudio server is launched on port 8787
# for example, if your server ip is 10.50.5.67, you can access the web ui via
http://10.50.5.67:8787/
# you need a login and password to login
Basic configuration¶
After installation, the configuration files are located at /etc/rstudio:
- /etc/rstudio/rserver.conf
- /etc/rstudio/rsession.conf
# you can verify if the configuration is correct or not
sudo rstudio-server verify-installation
# or restart the service
sudo rstudio-server restart
Configure a user account¶
By default, Rstudio server use the linux system account to do the authentication. The good practice is to create a new account for rstudio.
If you want to only allow certain group to be able to login. You can edit the below config file
vim /etc/rstudio/rserver.conf
# in this example, only allow admin and rstudio-users to login rstudio server
auth-required-user-group=admin,rstudio-users
# create a user account
sudo adduser rstudio
# if you don't specify a password. the password will be the same as login
# become root
sudo su -
# change password of a user account
passwd rstudio
Install R packages¶
You need to run below command in R console
# install devtools
install.packages("devtools")
# install sparklyr
install.packages("sparklyr")
Trouble shoot¶
If you have encounter errors such as No package 'libxml-2.0' found while installing sparklyr. You need to run below
command, it's a system dependency not R.
sudo apt-get install libxml2-dev
Load packages to current R session¶
library(sparklyr)
library(dplyr)
Connect to a spark cluster¶
# set env var, if sparklyr can't find where is spark and hadoop
Sys.setenv(SPARK_HOME="/opt/spark/spark-3.4.1")
# R session can't load the env var by default
Sys.setenv(HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop")
# custom spark session config
conf <- spark_config()
# if we add nothing into the conf, the spark default conf will be loaded
conf$spark.executor.memory <- "300M"
conf$spark.executor.cores <- 2
conf$spark.executor.instances <- 3
conf$spark.dynamicAllocation.enabled <- "false"
# set the queue of the yarn cluster
conf$spark.yarn.queue="prod"
# create a spark session
sc <- spark_connect(master = "yarn", version="3.4.1", spark_home = '/opt/spark/spark-3.4.1', config=conf)
Do some query¶
# read data from hdfs_optimisation.md
spark_read_csv(sc, name = "test", path = "hdfs://10.50.5.67:9000/user/rstudio/flights/airports.csv")
# do some analysis
# close the spark session
spark_disconnect(sc)