Skip to content

Install Rstudio server on debian 11

In this tutorial, we will install Rstudio server on debian 11.

Install the Rstudio deb

You can find all the available version in ths page: https://posit.co/download/rstudio-server/

# install r
sudo apt-get install r-base

# gdebi-core lets you install local deb packages resolving and installing its dependencies
sudo apt-get install gdebi-core

# some dependency packages
sudo apt-get -y install libcurl4-gnutls-dev
sudo apt-get -y install libssl-dev

# get the rstudio-server deb file
wget https://download2.rstudio.org/server/jammy/amd64/rstudio-server-2023.09.1-494-amd64.deb

# install the deb file
sudo gdebi rstudio-server-2023.09.1-494-amd64.deb

# you should see the rstudio server is launched on port 8787
# for example, if your server ip is 10.50.5.67, you can access the web ui via
http://10.50.5.67:8787/

# you need a login and password to login

Basic configuration

After installation, the configuration files are located at /etc/rstudio: - /etc/rstudio/rserver.conf - /etc/rstudio/rsession.conf

# you can verify if the configuration is correct or not
sudo rstudio-server verify-installation

# or restart the service
sudo rstudio-server restart

Configure a user account

By default, Rstudio server use the linux system account to do the authentication. The good practice is to create a new account for rstudio.

If you want to only allow certain group to be able to login. You can edit the below config file

vim /etc/rstudio/rserver.conf

# in this example, only allow admin and rstudio-users to login rstudio server
auth-required-user-group=admin,rstudio-users
# create a user account 
sudo adduser rstudio

# if you don't specify a password. the password will be the same as login
# become root
sudo su - 
# change password of a user account
passwd rstudio

Install R packages

You need to run below command in R console

# install devtools
install.packages("devtools")

# install sparklyr
install.packages("sparklyr")

Trouble shoot

If you have encounter errors such as No package 'libxml-2.0' found while installing sparklyr. You need to run below command, it's a system dependency not R.

sudo apt-get install libxml2-dev 

Load packages to current R session

library(sparklyr)

library(dplyr)

Connect to a spark cluster

# set env var, if sparklyr can't find where is spark and hadoop
Sys.setenv(SPARK_HOME="/opt/spark/spark-3.4.1")
# R session can't load the env var by default
Sys.setenv(HADOOP_CONF_DIR="/opt/hadoop/etc/hadoop")

# custom spark session config
conf <- spark_config()

# if we add nothing into the conf, the spark default conf will be loaded
conf$spark.executor.memory <- "300M"
conf$spark.executor.cores <- 2
conf$spark.executor.instances <- 3
conf$spark.dynamicAllocation.enabled <- "false"

# set the queue of the yarn cluster
conf$spark.yarn.queue="prod"

# create a spark session
sc <- spark_connect(master = "yarn", version="3.4.1", spark_home = '/opt/spark/spark-3.4.1', config=conf)

Do some query

# read data from hdfs_optimisation.md
spark_read_csv(sc, name = "test", path = "hdfs://10.50.5.67:9000/user/rstudio/flights/airports.csv")

# do some analysis


# close the spark session
spark_disconnect(sc)