Introduction to R
R is a powerful and versatile open-source programming language and software environment that supports data analysis, data visualization, statistical computing, machine learning, and geospatial analysis. We make extensive use of R throughout ca. 95% of the data processing and analysis that we undertake in TESS Lab. This page signposts useful resources for using R, at introductory, intermediate, and advanced levels.
Why R?
We like using R in our workflows because:
- R is free and open-source software, so workflows and skills developed in R can be reused across projects irrespective of research or commercial applications (unlike tools like ArcPro or Google Earth Engine that have more restrictive licensing).
- Workflows can be deployed more efficiently at a greater scale compared to analysis implemented in Graphical User Interface (GUI) environments. For example, code can be migrated to different locations for more intensive computational analysis, such as our High-Performance Workstations).
- Explicit programmatic workflows improve error detection, accuracy, and robustness of scientific analysis and insights, and workflows can easily be archived and shared to ensure transparency and reproducibility, in line with Open Research principles.
- Workflows can easily be updated to incorporate new data or analytical techniques as they become available.
- It is easy to build on earlier work to develop new applications and share workflows with partners to maximise the value of insights for operational deployment.
- Packages can be combined to create bespoke tools for specific project needs.
R works across multiple operating systems, and can be downloaded from the Comprehensive R Archive Network (CRAN). Most users work with R in an Integrated Development Environment (IDE); we recommend RStudio or , which has a free desktop version available here. While other programming languages (e.g., Python) are also powerful for geospatial analysis, it is more complicated to configure environments (creating barriers to entry), it is easier to develop proficiency when focusing on a single language, and many of the latest statistical tools are developed in R.
Resources for using R:
- For learning R for data science, the free online book “R for Data Science” (2nd Ed.) by Hadley Wickham et al. is one of the best resources but there’s lots more help available online once one learns some fundamental terminology
- Most of the time when reading csv files you’ll usually want to use read_csv rather than the older read.csv command.
- There are high-quality tutorials introducing many foundational and advanced aspects of R relevant to the kinds of scientific approaches that we use in TESS Lab that have been developed by the Edinburgh Code Club https://ourcodingclub.github.io/ (although beware that many of the geospatial tutorials are outdated – see instead the TESS Geospatial page).
See geospatial page
Training Materials
A good starting point is Exeter’s Coding for Reproducible Research initiative, which includes quick quizzes to rapidly assess R skills to help figure out which short courses might be useful to address knowledge gaps; see the schedule of future CfRR courses.
Example code from past and present TESS Lab projects is also available on the TESS Lab GitHub page (ask Andy if you want access non-public repos).
Intro to R, RStudio and Projects (after https://ourcodingclub.github.io/tutorials/intro-to-r/)
- Troubleshooting in R (https://ourcodingclub.github.io/tutorials/troubleshooting/)
- Introduction to Functional Programming (after https://ourcodingclub.github.io/tutorials/funandloops/)
- Split off introduction to R as a webpage to come before geospatial analysis
- Migrate to there
- Split off introduction to R as a webpage to come before geospatial analysis
- A good starting point is Exeter’s Coding for Reproducible Research initiative, which includes quick quizzes to rapidly assess R skills to help figure out which short courses might be useful to address knowledge gaps; see the schedule of future CfRR courses.
f you are new to using R, you might want to check out:
- Hadley Wickham’s tidyverse-based R for Data Science free book
- The tutorials at https://ourcodingclub.github.io/tutorials.html for the basics (though note the above warning to avoid superseded packages).
- The Turing Way contains modern language-agnostic advice for reproducibility, version control, data management, code style and much more.
Foundational training resources
- If you are new to using R, you might want to check out:
- Hadley Wickham’s tidyverse-based R for Data Science free book
- The tutorials at https://ourcodingclub.github.io/tutorials.html for the basics (though note the above warning to avoid superseded packages).
- Tutorials on using rstac to read and filter data from Microsoft Planetary Computer, and a second related tutorial.
- Tutorial on Geospatial vector data in R with sf (Creating static and interactive maps using osmdata, sf, ggplot2 and tmap).
Intermediate training resources
- The Turing Way contains modern language-agnostic advice for reproducibility, version control, data management, code style and much more.
- For more advanced spatial data analysis and statistics using
terra, see “Spatial Data Science with R and “terra”. - Tutorial on extracting information about spatial patterns from spectral signatures.
- Tutorial on extracting Landscape metrics in R by Jakub Nowosad.
- Live Demo Introducing functional programming in R (Hugh Graham, 2024-01-22) – (TEMP ASK ANDY FOR ACCESS).
- For modelling and mapping species distribution, check out this (1) Tutorial on Species occurrence and density maps using GBIF and Flickr data to visualise species occurrence, and also (2) the guide to species distribution modelling by Hijmans and Elith.
- Tutorial on Manipulation and visualisation of occurrence data (Cleaning occurrence data and customising graphs and maps).
Advanced training resources
Project management in R
- Efficient R Programming is a really helpful open-source book, covering things like workflow, data input/output, coding style, time management as well as code efficiency.
- Quarto is a multi-language, next-generation version of R Markdown from RStudio, with many new features and capabilities. Quarto uses Knitr to execute R code and is therefore able to render most existing Rmd files without modification. Quarto is a powerful tool for streamlining the production of multiple types of outputs (e.g., reports and presentations) from code.
- renv dependency management toolkit for R, aiding reproducible data science.
- targets pipeline tool coordinating the pieces of computationally demanding analysis projects (and geotargets for more intuitive handling of geospatial data formats).
MLR3 (Machine Learning in R)
- MLR3 Ecosystem homepage and the open source book Applied Machine Learning Using mlr3 in R (the best starting point for using MLR3).
- Introduction to MLR3 machine learning (by Hugh Graham, 2023-06-02) – (TEMP ASK ANDY FOR ACCESS).
- For accounting for spatial structure and autocorrelation in models, check out (1) this seminar and live demo by Guy Lomax & Hugh Graham (2023-04-14), and also the chapter on spatiotemporal CV in the mlr3 book.
- Blog post on Optimising feature selection with the Shadow Variable Search algorithm.
Advanced training resources
Project management in R
- Efficient R Programming is a really helpful open-source book, covering things like workflow, data input/output, coding style, time management as well as code efficiency.
- Quartois a multi-language, next-generation version of R Markdown from RStudio, with many new features and capabilities. Quarto uses Knitr to execute R code and is therefore able to render most existing Rmd files without modification. Quarto is a powerful tool for streamlining the production of multiple types of outputs (e.g., reports and presentations) from code.
- renv dependency management toolkit for R, aiding reproducible data science.
- targets pipeline tool coordinating the pieces of computationally demanding analysis projects (and geotargets for more intuitive handling of geospatial data formats).
Key Packages
Basic data manipulation
- tidyverse a set of packages that work in harmony because they share common data representations and ‘API’ design..
Cartography and data visualization
- ggplot2 for creating figures using the “grammar of graphics”.
- patchwork makes it easy to create multi-panel figures from ggplot objects.
- Colour palettes for colourblind-friendly data visualisations.
- grDevices::hcl.colors (base r option for colour palettes without dependencies)
- viridaslite for colourblind-friendly colour palettes.
- RColorBrewer for using color schemes created by Cynthia Brewer.
- colorspace for Manipulating and Assessing Colors and Palettes, inc. mapping between assorted colour spaces inc emulating colours).
- scico: Colour Palettes Based on the Scientific Colour-Maps Colour choice in information visualisation is important in order to avoid being mislead by inherent bias in the used colour palette. The ‘scico’ package provides access to the perceptually uniform and colour-blindness friendly palettes developed by Fabio Crameri and released under the “Scientific Colour-Maps” moniker. The package contains 24 different palettes and includes both diverging and sequential types
- cols4all: Colors for all Colour palettes for all people, including those with colour vision deficiency. Popular colour palette series have been organized by type and have been scored on several properties such as colour-blind-friendliness and fairness (i.e. do colours stand out equally?). Own palettes can also be loaded and analysed. Besides the common palette types (categorical, sequential, and diverging) it also includes bivariate colour palettes. Furthermore, a colour for missing values is assigned to each palette.
Other Tips
Note that sometimes the install.packages() function doesn’t properly pickup and install all dependencies, resulting in installation errors. One solution to this is using the pak::pkg_install() in the pak pacakge, which is a bit more robust in checking for dependencies.
