Using Computers for Research

Computers are one of the most important tools that we use almost daily to enable our research in TESS Lab. However, the default configurations of most computers are usually not optimised for research applications. This page contains some tips to help users start optimising their computers for research applications.

Discovering hidden content

Windows and Mac computers hide certain features such as file extensions and hidden files and directories by default, which is a problem in research applications when we often want to see these (for example to review, save or share data, or to be able to select the desired file types for different applications e.g., vector versus raster versions of files with the same names). To reveal this hidden content follow these steps:

Windows

Open File Explorer from the taskbar. Select View > Show, then select Hidden items to view hidden files and folders.
Open File Explorer from the taskbar. Select View > Show, then select File name extensions to view file name extensions.

IOS

Find Terminal under Launchpad > Other > Terminal, then run the following commands:
Type “defaults write com.apple.Finder AppleShowAllFiles true” and press Enter
Type “killall Finder” and press Enter
Once both lines of code run, you should see your hidden files in Finder and any temporary files saved on the desktop.

Scientific Computing in R and RStudio

Installing R and RStudio

We make extensive use of Free and Open Source Software for most of our data handling, analysis and visualisation in TESS Lab (unless we have good reason not to). The capability, versatility, and reproducibility of R far exceeds most proprietary tools and makes it easy for us to help one another, recycle our code and skills for future projects, and share our code alongside outputs such as publications and reports. We believe there is substantial value in developing foundational skills in this programming language for many different careers pathways, and that anyone can learn R with the right support.

If you’ve never used R before, you need to start by downloading and installing both R (the program) and an Integrated Development Environment applicaiton such as RStudio that makes it easier to work with R. You can get these from Posit.

Check also whether you need to be added to the R channel on the TESS Lab Slack Workspace for support.

Configuring RStudio

Once you have R and RStudio installed on your computer, we recommend implementing the following changes to the default configuration in order to ensure your work is reproducible (by forcing you to save your code properly, and make it easier to keep track of code and outputs as you work across multiple projects).

Setting options in RStudio
- Tools -> Global Options -> General
  - Workspace
    - Deselect ‘Restore .RData into workspace at startup
    - Set ‘Saveworkspace to .RData on exit:’ to ‘Never’
  - History
    - Deselect’ Always save history (even when not saving .RData).
- Reviewing package locations (environment variable)
  - When programming in R, we’ll usually be downloading and using ‘packages’ (mini libraries of code with functions for specific applications). It’s worth understanding where these are being saved to by running the “.libPaths()” command in the console. (Packages should not be saved in synced locations like OneDrive etc. The defaults usually work but occasionally it can be necessary to manually set the “R_LIBS_USER” environment variable: In Windows via: System properties -> Environment variables -> Create variable ‘R_LIBS_USER’ with value “C:\Program Files\R\R-4.3.1\library” or equivalent.

Getting Started with Version Control using Git and GitHub

Git is a widely used version control system that helps keep track of differences in files in a ‘repository’. It can be extremely useful for keeping track of what changed when and why in your code, which is particularly useful as codebases become more complicated.

We use GitHub as a cloud service to upload our Git repositories with code and small files (e.g. small datasets, or output data visualisations) to the cloud. This makes it much easier to (i) keep track of what changed when and why in our code, (ii) back up our work on a secure cloud server, (iii) migrate our code between machines (esp. useful if working on multiple systems for development and production runs), (iv) collaborate on developing code with supervisors, peers and project partners, and (v) sharing our codebase either with invited users or publicly (for example alongside an output such as a publication or report, often via the Zenodo permanent open archive to ensure it remains accessible).

For comprehensive guidance on everything from getting started to advanced troubleshooting, we recommend the open-source book Happy Git with R. Speak with Andy to check whether it makes sense to use Git and GitHub for your project.

Installing Git & GitHub

Download Git and install it on your computer.
Register for a GitHub user account.
Send your username to Andy to be added to the TESS Lab GitHub Organization.
Read this introduction to version control, Git and Git Hub Tutorial to learn about the basics.

Setting up GitHub

GitHub is very powerful tool that can be used in many different ways; what follows is a guide of one workflow to get started with RStudio.

On the TESS Lab GitHub Organization, create a new repository (if necessary),
- Select the privacy level (private/public), most people prefer new repos to be private while in development
- Initialize repo with
  - Check “Add a README file”
  - Check “Add .gitignore” (This file lets Git know what kind of files should not be included in the repository.)
  - Check “Choose a licence” Selecting the appropriate licence, (for most TESS Lab work, GNU GPL v3 is best)
- Click on ‘Create Repository’
Open the new repository webpage, click on the green ‘Code’ button, and copy the repository HTTPS URL (by clicking on the clipboard icon)
Create a new directory called “workspace” in the root directory on your computer. (Using a common layout helps us to efficiently migrate projects between different machines, and to avoid conflicts between versions it is important to not create Git repositories in a directory that is linked with any other version control or cloud storage such as OneDrive, Google Drive, Drobox etc.).
- On a Windows machine the root directory is usually the C drive,
- On Mac the root directory is usually called ‘Home’ (in Finder press Shift-Command-H to navigate to the Home).
Create a new RStudio project
- Open a new instance of RStudio
- File
- New Project
- Checkout from Version Control
- Select Git
  - Enter the URL for the repository in the top box
  - Confirm that the repository name is in the middle box (sometimes this auto populates but not always)
  - Confirm that the new project will be created in the “workspace” subdirectory in your computer’s root directory
- When prompted type in you GitHub user name (but DO NOT type in your GitHub password!)
Approve access via Git Hub Credential Manager
- Select the blue ‘Sign in with browser’ button
  - Expand options to read details,
  - Select authorise git-ecosystem
  - You will need to enter GitHub credentials,
  - (This doesn’t notify you that it’s worked sucsessfully, so don’t panic about the blank browser page)

Congratulations, this should have created a new RStudio project in your workspace directory that is linked with Git and GitHub.

Using GitHub

Once the project is established locally on your computer, you should be able to work on your code as normal, and then interact with GitHub through the following steps:

Save (files)
Commit (with an informative commit message)
Pull (from GitHub to local, checking for updates & conflicts),
Push (from local to GitHub)

Best practice for reproducible coding in RStudio:

To help avoid confusion and bad practices when working in R code, we recommend changing the following default options under Tools -> Global Options

Deselect “Restore most recently opened project at startup”
Deselect “Respore previosuly open source documents at start up”
Deselect “Restore .RData into workspace at startup”
Set “Save workspace to .RData on exit:” to Never
Deselect “Always save history”

More Tips and pitfalls to look out for when getting started:

Git and GitHub are powerful tools but things sometimes go wrong; if you get stuck with anything feel free to ask others in TESS lab via the R Slack channel for advice on solving them safely.
Ensure that you always use the main .rproj file to open your project to ensure the links with GitHub work properly (don’t open files within the repository directly).
It is usually helpful to include ‘data’ and ‘plots’ subdirectories within your project, to help keep the filespace streamlined and intuitive for you and others to navigate.
Write useful information about your project in the readme file (inc. name and contact information for the creator) and a short description of the repo structure and files.
Use relative file paths by default for inputs and outputs, to maximise the portability of your codebase between different machines.
GitHub is not good at handling ‘large’ datasets, so avoid including large files in your commits (a file size limit of < 100 MB but smaller is better). If your project draws on or produces ‘large’ files these are usually best saved in another directory and to use absolute file paths where necessary (this also makes it easier to keep track of what has been backed up or not, so is usually preferable to gitignore).
If you accidently try to commit large files, and cannot push to GitHub, it’s useful to consider “git reset –mixed HEAD~n” in the git terminal, where n is the number of commits to remove, to remove commit(s) without changing files (more info). This is the default mode of git reset command. If you want to undo the commit and remove your changes from the staging but do not remove them from working directory this mode is preferred. It leaves the working directory untouched but changes the index.

Archiving and Sharing Code and Data

It is usually necessary for us to create permanent open access versions of our finished scientific code and data to ensure that our work can be found, reviewed and used by others, whether because we believe in open science approaches, or to comply with the requirements of our funders and publishers.

There are several repository services that can support such publications and issue persistent Digital Object Identifier (DOI) that can be linked and citable in reports. We usually use Zenodo, a general-purpose open access repository created by OpenAIRE and CERN that has good integration with GitHub and allows uploads of up to 50 GB of files. Zenodo supports updates to code with older versions remaining accessible but the main DOI always directing to the latest version, which is valuable for instance during peer review processes. The TESS Lab Organization on GitHub is already linked with Zenodo, making it easy to add public repositories.

When you’re ready to publish your project repository, let Andy know to:

Navigate to the login page for Zenodo.
Click Log in with GitHub.
Review the information about access permissions, then click Authorize zenodo.
Navigate to the Zenodo GitHub page.
To the right of the name of the repository you want to archive, toggle the button to On.

Zenodo archives your repository and issues a new DOI each time you create a new GitHub release. Follow the steps at “Managing releases in a repository” to create a new one.

Safeguarding credentials for accessing APIs

Using some R packages requires users provide credentials (e.g., usernames and key/token), which we usually want to omit from public scripts. One simple approach to this is using environment variables. Home directories often contain a file called .Renviron (if not you can create it). Then in .Renviron, add something like:

CDS_USERNAME = "AndyCunliffe"
CDS_KEY = "eifgojweiofghjweoij"

Then, to access these variables in your R script you can just do:

cdc_usrname <- Sys.getenv(“CDS_USERNAME”)
cds_key <- Sys.getenv(“CDS_KEY”)

Remember never to upload your .Renviron file to GitHub – this should always be outside of your project. Add a note in your project readme advising that other users need to create the username and key and direct them to the correct URL. Other more complex solutions are usually only needed for deployment. To help ensure that you’re creating and editing the right .Renviron file, you can use the command usethis::edit_r_environ(scope=”user”).

TESS High-Performance Computing Workstation

In TESS Lab we have two high-performance computing workstations. A Linux (Ubuntu) 3XS machine supporting intensitve computing and geospatial processing analyses, including machine learning analysis of multi-scale remote sensing datasets (using R) and structure-from-motion photogrammetry (using OpenDroneMap and/or Pix4D). We also have a Windows Workstation. If you think that you might need access to these resources please contacnt Andy.

Technical specifications

CPU	Dual Intel 16 Core Xeon Gold 5218
RAM	768GB RAM (12 X Samsung 64GB Load-Reduced DDR4 2666 MHz ECC)
GPU	NVIDIA A2000 6GB GDDR6 With ECC Ampere Graphics Card
SSD	1.92TB PM9A3 M.2 NVMe Enterprise SSD
Storage	24TB storage (6 X 4TB HDD Configured as a RAID 5, providing 18TB storage with resilience to hardware failure)