Introduction to R

R is a powerful and versatile open source programming language and software environment that supports statistical analysis, data science, and data visualisation.  We make extensive use of R throughout ca. 95% of the data processing and analysis that we undertake in TESS Lab. This page signposts useful resources for using R, at (relatively) introductory, intermediate, and advanced levels. See the related webpages for geospatial and remote sensing analysis in R and data sources.

statistical computing, machine learning, and geospatial analysis.

Why R?

R is an open source programming language and software environment widely used for statistical analysis, data science and data visualisation. A large and rapidly evolving ecosystem of extension packages supports spatial data formats and analytical workflows for geospatial and remote sensing applications. We use R to develop programmatic analysis workflows because it offers several important advantages:

  • R is free and open source, allowing workflows and technical skills developed in R to be reused across projects in both research and commercial contexts (This contrasts with proprietary platforms such as ArcGIS Pro or Google Earth Engine, which operate under more restrictive licensing models).
  • Programmatic workflows improve transparency, reproducibility, and analytical robustness. Explicit code enables better error detection, version control, and documentation of analytical decisions, which strengthens scientific rigour and supports collaborative research.
  • R workflows can be efficiently scaled compared to analyses conducted in graphical user interface environments. Scripts can be deployed across different computing environments, including our high-performance workstations and cloud infrastructure, enabling computationally intensive analyses to be executed more efficiently.
  • Code, workflows, and documentation can be easily archived and shared, supporting open research principles and facilitating reproducibility and knowledge transfer.
  • R’s modular package ecosystem allows researchers to combine existing tools to develop bespoke analytical workflows tailored to specific project requirements.
  • Workflows can be readily updated to incorporate new datasets or methods as they become available, enabling flexible and adaptive research pipelines.
  • Working programmatically supports iterative development, making it straightforward to extend previous analyses and share workflows with project partners to maximise the operational value of research outputs.

R is cross platform and runs on Windows, macOS, and Linux. It can be downloaded from the Comprehensive R Archive Network (CRAN). Most users interact with R through an integrated development environment; we recommend RStudio, which provides a free desktop version. While other programming languages, such as Python, are also widely used for geospatial analysis, they often require more complex environment configuration. Focusing on a single language can reduce barriers to entry, and many cutting-edge statistical and ecological modelling tools are first released within the R ecosystem.

NB. This page is a work in progress!!!

Introductory Training Resources

Training Resources

Useful packages

Tips

Intermediate Training Resources

Advanced Training Resources

Project management in R
  • Efficient R Programming is a really helpful open-source book, covering things like workflow, data input/output, coding style, time management as well as code efficiency.
  • Quarto is a multi-language, next-generation version of R Markdown from RStudio, with many new features and capabilities. Quarto uses Knitr to execute R code and is therefore able to render most existing Rmd files without modification. Quarto is a powerful tool for streamlining the production of multiple types of outputs (e.g., reports and presentations) from code.
  • renv dependency management toolkit for R, aiding reproducible data science.
  • targets pipeline tool coordinating the pieces of computationally demanding analysis projects (and geotargets for more intuitive handling of geospatial data formats).
MLR3 (Machine Learning in R)

Intro to R, RStudio and Projects (after https://ourcodingclub.github.io/tutorials/intro-to-r/)

If you are new to using R, you might want to check out:

Foundational training resources


Key Packages

Basic data manipulation
  • tidyverse a set of packages that work in harmony because they share common data representations and ‘API’ design.
Cartography and data visualization
  • ggplot2 for creating figures using the “grammar of graphics”.
  • patchwork makes it easy to create multi-panel figures from ggplot objects.
  • Colour palettes for colourblind-friendly data visualisations.
    • grDevices::hcl.colors (base r option for colour palettes without dependencies)
    • viridaslite for colourblind-friendly colour palettes.
    • RColorBrewer for using colour schemes created by Cynthia Brewer.
    • colorspace for Manipulating and Assessing Colours and Palettes, inc. mapping between assorted colour spaces inc emulating colours.
    • scico: Colour Palettes Based on the Scientific Colour-Maps Colour choice in information visualisation is important in order to avoid being misled by inherent bias in the used colour palette. The ‘scico’ package provides access to the perceptually uniform and colour-blindness friendly palettes developed by Fabio Crameri and released under the “Scientific Colour-Maps” moniker. The package contains 24 different palettes and includes both diverging and sequential types
    • cols4all: Colors for all Colour palettes for all people, including those with colour vision deficiency. Popular colour palette series have been organized by type and have been scored on several properties such as colour-blind-friendliness and fairness (i.e. do colours stand out equally?). Own palettes can also be loaded and analysed. Besides the common palette types (categorical, sequential, and diverging) it also includes bivariate colour palettes. Furthermore, a colour for missing values is assigned to each palette.

Other Tips

Note that sometimes the install.packages() function doesn’t properly detect and install all dependencies, resulting in installation errors. One solution to this is using the pak::pkg_install() in the pak package, which is a bit more robust in checking for dependencies.