Welcome to the last practical.
Congratulations, you have come a very long way…
In the previous two sessions, we have focessed a lot on implementing command-line based data retrieval and analysis, and tried to move away from a point-and-click approach.
You will probably have noticed that the level has gone up, with more and more pieces of code left open for you to fill in.
Make sure you found your way trough the first practical as well as the fifth practical as this last practical builds upon this.
Step 1: The case-study
The case we are investigating today is the salinization of fresh water lakes. Recently, a global dataset of surface water salinity - with measurements between 1980 and 2019 - was published here. The paper reports on the dataset and how it was established. In this practical we will analyze the dataset to answer following questions:
- has salinity - as measured by the electrical conductivity (EC) increased or descreased in global freshwater lakes?
- Is salinity of the water linked to rainfall deficits?
- Are increases/decreases in salinity different across different biomes?
- Which local drivers can influence salinity trends?
Problem simplification
Much like all problems, we’ll need to simplify and define this one as well:
| Building block | Decision |
|---|---|
| Geographic scale | Points: measurement points of salinity in the database. Each seperate point is considered to be a location (regardless if two points are taken in the same water body) |
| temporal scale | We will compare averages over 1980-1990 with averages over 2005-2015: only stations that have >5y of measurement in both epochs are considered |
| Assumption | We assume that water deficits (low precipitation with high evaporation) for example here is linked to higher salinity |
| Dimensions | we focus on (i) a quanitified rainfall deficit and (ii) the biome map |
| Dimension description | The Terraclimate dataset (Climate water deficit band) and the OpenLand Biome map |
Data description
Now that we have described the problem, we can describe the data we’ll use
| Dataset | Type | Source | Access point |
|---|---|---|---|
| EC sampling points | Vector:points | Thorslund et al. 2020 | here |
| TerraClimate, water deficit | Raster | derived from the TerraClimate Collection | Google Earth Engine Catalogue |
| Biome map | Raster | OpenLand potential Biomes | Google Earth Engine Catalogue |
The datasets by Thorslund et al. are very large and need to be pre-processed as we want to compare data from 1980-1990 to that from 2005-2015 for those stations for which an average can be reliably calculated.
To make this exercise feasible, this preprocessing has already been done: the aggregated csv file and its conversion into a shapefile as well as the original file can be found here.
Now we are ready for the next step: we’ll first explore the original and the pre-processed data in R