View on GitHub

worldfoodecosystems2023

Welcome to the last practical.

Congratulations, you have come a very long way…

In the previous two sessions, we have focessed a lot on implementing command-line based data retrieval and analysis, and tried to move away from a point-and-click approach.

You will probably have noticed that the level has gone up, with more and more pieces of code left open for you to fill in.

Make sure you found your way trough the first practical as well as the fifth practical as this last practical builds upon this.

Step 1: The case-study

The case we are investigating today is the salinization of fresh water lakes. Recently, a global dataset of surface water salinity - with measurements between 1980 and 2019 - was published here. The paper reports on the dataset and how it was established. In this practical we will analyze the dataset to answer following questions:

Problem simplification

Much like all problems, we’ll need to simplify and define this one as well:

Building block Decision
Geographic scale Points: measurement points of salinity in the database. Each seperate point is considered to be a location (regardless if two points are taken in the same water body)
temporal scale We will compare averages over 1980-1990 with averages over 2005-2015: only stations that have >5y of measurement in both epochs are considered
Assumption We assume that water deficits (low precipitation with high evaporation) for example here is linked to higher salinity
Dimensions we focus on (i) a quanitified rainfall deficit and (ii) the biome map
Dimension description The Terraclimate dataset (Climate water deficit band) and the OpenLand Biome map

Data description

Now that we have described the problem, we can describe the data we’ll use

Dataset Type Source Access point
EC sampling points Vector:points Thorslund et al. 2020 here
TerraClimate, water deficit Raster derived from the TerraClimate Collection Google Earth Engine Catalogue
Biome map Raster OpenLand potential Biomes Google Earth Engine Catalogue

The datasets by Thorslund et al. are very large and need to be pre-processed as we want to compare data from 1980-1990 to that from 2005-2015 for those stations for which an average can be reliably calculated.

To make this exercise feasible, this preprocessing has already been done: the aggregated csv file and its conversion into a shapefile as well as the original file can be found here.

Now we are ready for the next step: we’ll first explore the original and the pre-processed data in R