Geospatial and Remote Sensing Analysis in R
This page signposts useful resources for undertaking geospatial and remote sensing analysis in R. We aim to help people navigate the many options available and to help standardise the modern tools we reach for first, to aid mutual support and efficent progress within and beyond our TESS Lab community.
Why R?
R is an open-source programming language and platform that is widely used in statistical analysis, data science and data visualization. Many powerful extension packages have been developed to support spatial data formats for geospatial and remote sensing applications in R. We like using R to develop programmatic geospatial analysis workflows because:
- R is free and open-source software, so workflows and skills developed in R can be reused across projects irrespective of research or commercial applications (unlike tools like ArcPro or Google Earth Engine that have more restrictive licensing).
- Explicit programmatic workflows improve error detection, accuracy, and robustness of scientific analysis and insights.
- Workflows can be deployed more efficiently at a greater scale compared to analysis implemented in Graphical User Interface (GUI) environments. For example, code can be migrated to different locations for more intensive computational analysis, such as our High-Performance Workstations).
- Workflows and documentation can easily be archived and shared to ensure transparency and reproducibility, aligned with Open Research principles.
- Packages can be combined to create bespoke tools for specific project needs,
- Workflows can easily be updated to incorporate new data or analytical techniques as they become available.
- It is easy to build on earlier work to develop new applications and share workflows with partners to maximise the value of insights for operational deployment.
R works across multiple operating systems, and can be downloaded from the Comprehensive R Archive Network (CRAN). Most users work with R in an Integrated Development Environment (IDE); we recommend RStudio, which has a free desktop version available here. While other programming languages (e.g., Python) are also powerful for geospatial analysis, it is more complicated to configure environments (creating barriers to entry), it is easier to develop proficiency when focusing on a single language, and many of the latest statistical tools are developed in R.
Training Materials
WARNING: There were major changes in R spatial in the past; avoid using superseded packages such as sp, raster, rgdal, rgeos, maptools, or cartography for new projects.
The free online book “Geocomputation with R (2nd ed)” by Lovelace et al. is the single best resource for geospatial science, geographic data analysis, visualization and modelling in R, closely followed by “Spatial Data Science with R and “terra” (for more advanced spatial data analysis and statistics).
A good starting point is Exeter’s Coding for Reproducible Research initiative, which includes quick quizzes to rapidly assess R skills to help figure out which short courses might be useful to address knowledge gaps; see the schedule of future CfRR courses.
Example code from past and present TESS Lab projects is also available on the TESS Lab GitHub page (ask Andy if you want access non-public repos).
For making aesthetically pleasing publication quality figures with map elements (e.g., legends and scales etc.), we recommend packages such as mapsf, ggplot2 with ggspatial, tmap (esp. for interactive maps), or mapview (great for interactive maps), or for making complex (multi-element) one-off figures, sometimes using desktop GIS such as QGIS.
Foundational training resources
- If you are new to using R, you might want to check out:
- Hadley Wickham’s tidyverse-based R for Data Science free book
- The tutorials at https://ourcodingclub.github.io/tutorials.html for the basics (though note the above warning to avoid superseded packages).
- Tutorials on using rstac to read and filter data from Microsoft Planetary Computer, and a second related tutorial.
- Tutorial on Geospatial vector data in R with sf (Creating static and interactive maps using osmdata, sf, ggplot2 and tmap).
Intermediate training resources
- The Turing Way contains modern language-agnostic advice for reproducibility, version control, data management, code style and much more.
- For more advanced spatial data analysis and statistics using
terra, see “Spatial Data Science with R and “terra”. - Tutorial on extracting information about spatial patterns from spectral signatures.
- Tutorial on extracting Landscape metrics in R by Jakub Nowosad.
- Live Demo Introducing functional programming in R (Hugh Graham, 2024-01-22) – (TEMP ASK ANDY FOR ACCESS).
- For modelling and mapping species distribution, check out this (1) Tutorial on Species occurrence and density maps using GBIF and Flickr data to visualise species occurrence, and also (2) the guide to species distribution modelling by Hijmans and Elith.
- Tutorial on Manipulation and visualisation of occurrence data (Cleaning occurrence data and customising graphs and maps).
Advanced training resources
Project management in R
- Efficient R Programming is a really helpful open-source book, covering things like workflow, data input/output, coding style, time management as well as code efficiency.
- Quarto is a multi-language, next-generation version of R Markdown from RStudio, with many new features and capabilities. Quarto uses Knitr to execute R code and is therefore able to render most existing Rmd files without modification. Quarto is a powerful tool for streamlining the production of multiple types of outputs (e.g., reports and presentations) from code.
- renv dependency management toolkit for R, aiding reproducible data science.
- targets pipeline tool coordinating the pieces of computationally demanding analysis projects (and geotargets for more intuitive handling of geospatial data formats).
MLR3 (Machine Learning in R)
- MLR3 Ecosystem homepage and the open source book Applied Machine Learning Using mlr3 in R (the best starting point for using MLR3).
- Introduction to MLR3 machine learning (by Hugh Graham, 2023-06-02) – (TEMP ASK ANDY FOR ACCESS).
- For accounting for spatial structure and autocorrelation in models, check out (1) this seminar and live demo by Guy Lomax & Hugh Graham (2023-04-14), and also the chapter on spatiotemporal CV in the mlr3 book.
- Blog post on Optimising feature selection with the Shadow Variable Search algorithm.
Key Packages
Basic data manipulation
- tidyverse a set of packages that work in harmony because they share common data representations and ‘API’ design.
- sf (simple features for R) is the most popular package for encoding spatial data (functionally sf replaces the sp package).
- terra for working with raster (gridded) data (note that terra replaced the raster package). The package author has written an introductory guide to terra.
- gdalcubes Earth Observation Data Cubes from Satellite Image Collections, facilitating processing collections of Earth observation images as on-demand multispectral, multitemporal raster data cubes. Users define cubes by spatiotemporal extent, resolution, and spatial reference system and let ‘gdalcubes’ automatically apply cropping, reprojection, and resampling using the ‘Geospatial Data Abstraction Library’ (‘GDAL’).
- exactextractr quickly and accurately summarizes raster values over polygonal areas (“zonal statistics”). Best weighted-averaging tool!
- landscapemetrics calculates landscape metrics for categorical landscape patterns in a tidy workflow. The package works on ‘terra’ SpatRaster objects as inputs, and can visualize patches and select metrics and building blocks to develop new metrics. See more guidance on using landscapemetrics.
- yyjsonr is a fast JSON parser/serializer, which converts R data to/from JSON inc. geojson (ca. 2-10x faster than jsonlite at both reading and writing).
- whitebox the ‘WhiteboxTools’ library for advanced geospatial data analysis, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. ‘WhiteboxTools’ also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Read/write ‘las’ and ‘laz’ files, computation of metrics in area-based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation etc.
- countrycode listing all country names, including conversion between 40 different coding schemes and assigning region descriptors (and the more niche countries package can also be handy for wrangling particularly messy country name data).
- lidR is a toolbox to facilitate exploration, manipulation and visualization of LiDAR (Light Detection and Ranging) data. Read/write ‘las’ and ‘laz’ files, computation of metrics in area-based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation etc. The lidR documentation is a brilliant resource for maximising performance.
- lasR compliments the LidR package and is designed for production, being optimized for memory and speed (using C++). It is much more efficient for common tasks like the production of CHM, DTM, tree detection and segmentation on large coverage production.
- RStoolbox is a collection of remote sensing functions, the ‘coregisterImages‘ function is handy for Image-to-Image Co-Registration based on Mutual Information (a function that is unavailable elsewhere), although most other functions are mostly superseded by other packages.
- maptools is a popular set of tools for manipulating geographic data.
- ncdf4 for working with netCDF files.
- suncalc to compute sun position, sunlight phases (times for sunrise, sunset, dusk, etc.), moon position and lunar phase for any locations and times.
- gt build display tables from tabular data with an easy-to-use set of functions.
- shapefiles has functions for reading ESRI Shapefiles.
Other analytical tools
- RobustLinearReg for fitting Thiel-Sen slope for robust regression (RobustLinearReg is ~66 times faster than the mblm package which can’t handle NA in a sample). the trend package is another option for non-parametirc trend tests.
- WorldFlora is the best package for taxonomic harmonisation, reviewing and correcting plant species names, using the World Flora Online Taxonomic Backbone downloaded from worldfloraonline.org.
Cartography and data visualization
- ggplot2 and ggspatial for creating plots and maps using the “grammar of graphics”. GGally can furter extends ‘ggplot2’ by adding functions to reduce the complexity of combining geometric objects with transformed data.
- patchwork makes it easy to create multi-panel figures from ggplot objects.
- ggpmisc miscellaneous extensions to ‘ggplot2’, inc. annotations and plotting fitted models.
- Colour palettes for colourblind-friendly data visualisations.
- grDevices::hcl.colors (base r option for colour palettes without dependencies)
- viridaslite for colourblind-friendly colour palettes.
- RColorBrewer for using color schemes created by Cynthia Brewer.
- colorspace for Manipulating and Assessing Colors and Palettes, inc. mapping between assorted colour spaces inc emulating colours).
- scico: Colour Palettes Based on the Scientific Colour-Maps Colour choice in information visualisation is important in order to avoid being mislead by inherent bias in the used colour palette. The ‘scico’ package provides access to the perceptually uniform and colour-blindness friendly palettes developed by Fabio Crameri and released under the “Scientific Colour-Maps” moniker. The package contains 24 different palettes and includes both diverging and sequential types
- cols4all: Colors for all Colour palettes for all people, including those with colour vision deficiency. Popular colour palette series have been organized by type and have been scored on several properties such as colour-blind-friendliness and fairness (i.e. do colours stand out equally?). Own palettes can also be loaded and analysed. Besides the common palette types (categorical, sequential, and diverging) it also includes bivariate colour palettes. Furthermore, a colour for missing values is assigned to each palette.
- tmap for thematic mapping using “grammar of graphics” syntax and basemap generation. See also this open source book on ‘Elegant and informative maps with tmap‘.
- mapsf Create and integrate thematic maps, with cartographic representations such as proportional symbols, choropleth or typology maps. It also offers several functions to display layout elements that improve the graphic presentation of maps (e.g. scale bar, north arrow, title, labels).
- leaflet for the creation of interactive web maps.
- mapview for creation of interactive spatial visualizations with pop-up windows for attribute data.
- ggmap web mapping with Google Maps of OpenStreetMap.
- gratia ‘ggplot’-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the ‘mgcv’ package.
Spatial statistics
- DescTools a collection of miscellaneous basic statistic functions for descriptive statics, including Lin’s concordance correlation coefficient (CCC).
- gstat package for modelling, prediction and simulation of geostatistical data in one, two or three dimensions, inc. (semi)variograms.
- spatial package for kriging and point pattern analysis.
- mgcv generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (restricted) marginal likelihood, generalized cross-validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference.
- spatstat package for point pattern analysis.
- spdep for creating spatial weights matrices and testing spatial dependence using the Moran’s I index.
Machine learning
MLR3 ecosystem (This is the ML ecosystem that we currently use the most in TESS Lab and can provide the most support with)
- mlr3 efficient, object-oriented programming on the building blocks of machine learning. The package is geared towards scalability and larger datasets. Add-on packages in the MLR3 ecosystem provide additional functionality. See also the open source book Applied Machine Learning Using mlr3 in R (the best starting point for using MLR3).
- mlr3verse this wrapper package simplifies the installation and loading of the core ‘mlr3’ packages.
- mlr3spatiotempcv extends the mlr3 ML framework with spatio-temporal resampling methods to account for the presence of spatiotemporal autocorrelation (STAC) in predictor variables.
‘Ecosystem’-agnostic ML tools
- ranger fast implementation of Random Forests, particularly suited for high dimensional data. (Can be used within MLR3).
- lightgbm (Light Gradient Boosting Machine), tree-based algorithms can be improved by introducing boosting frameworks.
- torch functionality to define and train neural networks similar to ‘PyTorch’ and supports low-level tensor operations and ‘GPU’ acceleration.
- luz is a Higher Level ‘API’ for ‘torch’, providing utilities to reduce the the amount of code needed for common tasks, abstract away torch details and make the same code work on both the ‘CPU’ and ‘GPU’. (see also example U-net implementation).
- catboost high-performance open source library for gradient boosting on decision trees, slightly more complex to set up.
- sits satellite image time series analysis for Earth Observation Data Cubes. Much more functionality than gdalcubes with complete pipeline functionality for using data for land use mapping but is a little less versatile.
Tidymodels ecosystem
- tidymodels is a collection of packages for modelling and machine learning using tidyverse principles.
Note that additional potentially useful packages are listed on the CRAN spatial analysis page.
