Data

The soil data in a rain forest plot used also in Mrkvička and Myllymäki (2022) are openly available in Smithsonian at . However, because it is somewhat complicated to download it, for simplicity we below study technically similar type of data that are directly available from the R package spatstat.

The R library spatstat contains a data set bei, which represents a single species from the rainforest data set. This pattern of trees is associated with a few covariates, namely elevation and gradient in the region. Although the analysis of correlation between elevation and gradient is not that interesting, we use them below for showing the methodology.

library(spatstat)
data(bei)
plot(bei.extra$elev)

plot(bei.extra$grad)

#- Plot the data using ggplot. Also preparation of the data.
xy <- expand.grid(x=bei.extra$elev$xcol, y=bei.extra$elev$yrow)
df1 <- data.frame(x=xy$x, y=xy$y, values=as.vector(t(bei.extra$elev$v)), Variable="Elevation")
xy <- expand.grid(x=bei.extra$grad$xcol, y=bei.extra$grad$yrow)
df2 <- data.frame(x=xy$x, y=xy$y, values=as.vector(t(bei.extra$grad$v)), Variable="Gradient")

p1 <- ggplot() +
  geom_raster(data=df1, aes(x=.data$x, y=.data$y, fill=.data$values)) +
  coord_fixed() + ggtitle("Elevation")
p2 <- ggplot() +
  geom_raster(data=df2, aes(x=.data$x, y=.data$y, fill=.data$values)) +
  coord_fixed() + ggtitle("Gradient")
p1 + p2 + plot_layout(ncol=1)

As an (toy) example, we will study the local correlation between two of these covariates.

Data preparation

The dataframes df1 and df2 contain our data now. Let’s put them into a joint dataframe, where the first two columns correspond to the values of the two random fields whose correlations are to be studied, and the third and fourth columns correspond to the x- and y-coordinates where these random fields have been observed. Further, width and height gives the sizes of the grid cells that the x- and y-locations represent (here of equal size).

# Observations should be at the same locations.
# (Otherwise some smoothing methods must be applied first to get values on the same coordinates.)
if(!identical(df1$x, df2$x) || !identical(df1$y, df2$y)) message("Data observed on different locations!")

Var1 <- df1[!is.na(df1$values) & !is.na(df2$values),]
Var2 <- df2[!is.na(df1$values) & !is.na(df2$values),]

dat <- data.frame(Var1 = Var1$values, Var2 = Var2$values,
                  X = Var1$x, Y = Var1$y,
                  width = bei.extra$elev$xstep, height = bei.extra$elev$ystep)
head(dat)
##     Var1      Var2  X Y width height
## 1 120.63 0.2519312  0 0     5      5
## 2 121.94 0.3093527  5 0     5      5
## 3 123.46 0.3284767 10 0     5      5
## 4 125.07 0.3226355 15 0     5      5
## 5 126.65 0.3047866 20 0     5      5
## 6 127.96 0.2715315 25 0     5      5

Local correlation

Vilodomat et al. (2014) procedure requires a smoothing parameter Delta of the local correlation. We use a range of values according to Vilodomat et al. (2014).

Then we can study the local correlations controlling for the false discovery rate (FDR, chosen in the argument typeone) as follows:

Delta <- seq(0.1,0.9,0.1)

# nsim = 1000 runs for about 45 minutes (with one kernel)
system.time(
res <- GET.localcor(data = dat, Delta = Delta, nsim = 1000, typeone = "fdr",
                    varying.bandwidth = FALSE, bandwidth.h = 100, notest=F)
)
plot(res)