Introduction to R
R is a powerful and versatile open source programming language and software environment that supports statistical analysis, data science, and data visualisation. We make extensive use of R throughout ca. 95% of the data processing and analysis that we undertake in TESS Lab. This page signposts useful resources for using R, at (relatively) introductory, intermediate, and advanced levels. See the related webpages for geospatial and remote sensing analysis in R and data sources.
statistical computing, machine learning, and geospatial analysis.
Why R?
R is an open source programming language and software environment widely used for statistical analysis, data science and data visualisation. A large and rapidly evolving ecosystem of extension packages supports spatial data formats and analytical workflows for geospatial and remote sensing applications. We use R to develop programmatic analysis workflows because it offers several important advantages:
- R is free and open source, allowing workflows and technical skills developed in R to be reused across projects in both research and commercial contexts (This contrasts with proprietary platforms such as ArcGIS Pro or Google Earth Engine, which operate under more restrictive licensing models).
- Programmatic workflows improve transparency, reproducibility, and analytical robustness. Explicit code enables better error detection, version control, and documentation of analytical decisions, which strengthens scientific rigour and supports collaborative research.
- R workflows can be efficiently scaled compared to analyses conducted in graphical user interface environments. Scripts can be deployed across different computing environments, including our high-performance workstations and cloud infrastructure, enabling computationally intensive analyses to be executed more efficiently.
- Code, workflows, and documentation can be easily archived and shared, supporting open research principles and facilitating reproducibility and knowledge transfer.
- R’s modular package ecosystem allows researchers to combine existing tools to develop bespoke analytical workflows tailored to specific project requirements.
- Workflows can be readily updated to incorporate new datasets or methods as they become available, enabling flexible and adaptive research pipelines.
- Working programmatically supports iterative development, making it straightforward to extend previous analyses and share workflows with project partners to maximise the operational value of research outputs.
R is cross platform and runs on Windows, macOS, and Linux. It can be downloaded from the Comprehensive R Archive Network (CRAN). Most users interact with R through an integrated development environment; we recommend RStudio, which provides a free desktop version. While other programming languages, such as Python, are also widely used for geospatial analysis, they often require more complex environment configuration. Focusing on a single language can reduce barriers to entry, and many cutting-edge statistical and ecological modelling tools are first released within the R ecosystem.
NB. This page is a work in progress!!!
Introductory Training Resources
- For learning R for data science, the free online book “R for Data Science” (2nd Ed.) by Hadley Wickham et al. is one of the best resources, and there’s lots more help available online once one learns some fundamental terminology.
- Most of the time when reading csv files you’ll usually want to use read_csv rather than the older read.csv command.
- A good starting point is Exeter’s Coding for Reproducible Research initiative, which includes quick quizzes to rapidly assess R skills to help figure out which short courses might be useful to address knowledge gaps; see the schedule of future CfRR courses.
- There are high-quality tutorials introducing many foundational and advanced aspects of R relevant to the kinds of scientific approaches that we use in TESS Lab that have been developed by the Edinburgh Code Club https://ourcodingclub.github.io/
- (Beware that many of the geospatial tutorials there are outdated – see instead the TESS Geospatial page).
Training Resources
Useful packages
Tips
Intermediate Training Resources
- **The Turing Way** contains modern language-agnostic advice for reproducibility, version control, data management, code style and much more.
- For more advanced spatial data analysis and statistics using
terra, see “Spatial Data Science with R and “terra”. - Tutorial on extracting information about spatial patterns from spectral signatures.
- Tutorial on extracting Landscape metrics in R by Jakub Nowosad.
- Demo Introducing functional programming in R (Hugh Graham, 2024-01-22) – (ASK ANDY FOR ACCESS).
- For modelling and mapping species distribution, check out this (1) Tutorial on Species occurrence and density maps using GBIF and Flickr data to visualise species occurrence, and also (2) the guide to species distribution modelling by Hijmans and Elith.
- Tutorial on Manipulation and visualisation of occurrence data (Cleaning occurrence data and customising graphs and maps).
Advanced Training Resources
Project management in R
- Efficient R Programming is a really helpful open-source book, covering things like workflow, data input/output, coding style, time management as well as code efficiency.
- Quarto is a multi-language, next-generation version of R Markdown from RStudio, with many new features and capabilities. Quarto uses Knitr to execute R code and is therefore able to render most existing Rmd files without modification. Quarto is a powerful tool for streamlining the production of multiple types of outputs (e.g., reports and presentations) from code.
- renv dependency management toolkit for R, aiding reproducible data science.
- targets pipeline tool coordinating the pieces of computationally demanding analysis projects (and geotargets for more intuitive handling of geospatial data formats).
MLR3 (Machine Learning in R)
- MLR3 Ecosystem homepage and the open source book Applied Machine Learning Using mlr3 in R (the best starting point for using MLR3).
- Introduction to MLR3 machine learning (by Hugh Graham, 2023-06-02) – (TEMP ASK ANDY FOR ACCESS).
- For accounting for spatial structure and autocorrelation in models, check out (1) this seminar and live demo by Guy Lomax & Hugh Graham (2023-04-14), and also the chapter on spatiotemporal CV in the mlr3 book.
- Blog post on Optimising feature selection with the Shadow Variable Search algorithm.
Intro to R, RStudio and Projects (after https://ourcodingclub.github.io/tutorials/intro-to-r/)
- Troubleshooting in R (https://ourcodingclub.github.io/tutorials/troubleshooting/)
- Introduction to Functional Programming (after https://ourcodingclub.github.io/tutorials/funandloops/)
- Split off the introduction to R as a webpage to come before geospatial analysis
- Migrate to there
- Split off the introduction to R as a webpage to come before geospatial analysis
If you are new to using R, you might want to check out:
- Hadley Wickham’s tidyverse-based R for Data Science free book
- The tutorials at https://ourcodingclub.github.io/tutorials.html for the basics (though note the above warning to avoid superseded packages).
- The Turing Way contains modern language-agnostic advice for reproducibility, version control, data management, code style and much more.
Foundational training resources
- If you are new to using R, you might want to check out:
- Hadley Wickham’s tidyverse-based R for Data Science free book
- The tutorials at https://ourcodingclub.github.io/tutorials.html for the basics (though note the above warning to avoid superseded packages).
- Tutorials on using rstac to read and filter data from Microsoft Planetary Computer, and a second related tutorial.
- Tutorial on Geospatial vector data in R with sf (Creating static and interactive maps using osmdata, sf, ggplot2 and tmap).
Key Packages
Basic data manipulation
- tidyverse a set of packages that work in harmony because they share common data representations and ‘API’ design.
Cartography and data visualization
- ggplot2 for creating figures using the “grammar of graphics”.
- patchwork makes it easy to create multi-panel figures from ggplot objects.
- Colour palettes for colourblind-friendly data visualisations.
- grDevices::hcl.colors (base r option for colour palettes without dependencies)
- viridaslite for colourblind-friendly colour palettes.
- RColorBrewer for using colour schemes created by Cynthia Brewer.
- colorspace for Manipulating and Assessing Colours and Palettes, inc. mapping between assorted colour spaces inc emulating colours.
- scico: Colour Palettes Based on the Scientific Colour-Maps Colour choice in information visualisation is important in order to avoid being misled by inherent bias in the used colour palette. The ‘scico’ package provides access to the perceptually uniform and colour-blindness friendly palettes developed by Fabio Crameri and released under the “Scientific Colour-Maps” moniker. The package contains 24 different palettes and includes both diverging and sequential types
- cols4all: Colors for all Colour palettes for all people, including those with colour vision deficiency. Popular colour palette series have been organized by type and have been scored on several properties such as colour-blind-friendliness and fairness (i.e. do colours stand out equally?). Own palettes can also be loaded and analysed. Besides the common palette types (categorical, sequential, and diverging) it also includes bivariate colour palettes. Furthermore, a colour for missing values is assigned to each palette.
Other Tips
Note that sometimes the install.packages() function doesn’t properly detect and install all dependencies, resulting in installation errors. One solution to this is using the pak::pkg_install() in the pak package, which is a bit more robust in checking for dependencies.
