Exercise 2: opening and analyzing the data in QGIS
Now that we have all the data in our folder, we can open QGIS and load the vector (shapefile) of the watersheds, and the two rasterfiles (the biodiversity tif and the NDVI tif files).
Now we get to the core objective of the exercise: what is the link between biodiversity and available vegetation?
To answer this question, we’ll need to decide and simplify (see course 1)
- Geographic scale – spatial unit
- Temporal scale
- Boundary conditions/assumptions
- Dimensions (which processes and structures will you account for, which not)
- Dimension descriptions (how do you approximate/describe the dimension)
Of course, for the purpose of this exercise, these decisions have already been taken, and are summarized here:
| building block | decision |
|---|---|
| Geographic scale | Watersheds |
| temporal scale | similar timespans need to be covered and aggregated over sufficiently large timespan |
| Assumption | more primary producers = more available energy, resulting into a higher biodiversity |
| Dimensions | We’ll consider vegetation cover and mammal biodiversity |
| Dimension description | MODIS NDVI (as proxy for vegetation) and mammal species richness by biodiversity.org |
Now that we have simplified we can take the average species richness and average NDVI per watershed
- we’ll use the function zonal statistics in the QGIS toolbox to calculate this
- because both are averages over a spatial unit, the size of the spatial unit is not explicitally corrected for here
- In principle, it is good practice to have all your files in the same projection system (see also, courses of Digital Earth). In our case, the shapefile is in geographic coordinates (WGS84) while the raster file on species richness is in World Eckert IV. The Zonal Statistics tool we use in this class apparently is capable of dealing with this difference. However, if ever a tool does not work (properly) remember that this could be one of the reasons. If you would like to reproject a vector layer, here’s a video on how to do so.
Our vector file now has two extra attribute columns: mean NDVI and mean mammal richness.
Can we now visualize this relationship?
- we are particularly interested in how mammal richness depends on NDVI, or Richness~f(NDVI)
- QGIS (and many others) are typically used for visualization and analysis of geographic (spatial) data
- But, some basic analytic figures can be made as well, e.g. a Scatterplot:
** We now have a scatterplot, but the statistical tools to analyse these data in QGIS are limited. So, let’s try to import this data into Rstudio and build a simple statistical regression