View on GitHub

worldfoodecosystems2023

Step 2: Data Exploration

The first thing to do is to understand the data you are working with.

In the folder you downloaded on the previous page there are 3 types of files:

Unzip the folder you just downloaded and open both csv files in R and start a new R script. If you are not sure on how to do that: consult the previous practical here

The original database might take a while to load (why?). The lakesaverage.csv should load quickly.

Step1: exploring the data:

#let's check out this 'DIFF' column
difference<- boxplot(lakesaverage$DIFF, outline = F)


Now, let’s do a first check: how well are the EC values in the second epoch associated with those in the first epoch?

plot(?, ?) #give the axis correct names

#theoretically, if no change occurs, they should more or less lie on the 1:1 line (the diagonal): let's plot this: 
abline(a=0, b=1,col="red")
legend(x = 2, y=1000, legend = c("diagonal line"), col=c("red"), lty=1)



Because of the two outliers, it’s a bit difficult to see what goes on with the majority of the points…

Build the same plot, but now limit the x and y axis to ‘2500’, give the axis appropriate names and adjust the legend so that it still falls within the plot. Personalize this graph by adding a title with your name and load this graph to the canvas quiz.

If you don’t know how to do this by heart, google is your best friend. I found e.g. this resource

Visually, do you conclude that most points have increased or decreased in their EC content? Does this confirm what you saw in the boxplot?

**OK, now on to the GEE processing: let’s extract data on water deficit and the biomes for each point in the next exercise