Utilizing the spatstat package

In general, it is useful in the point pattern analysis utilize the spatstat package. The workflow utilizing spatstat with the GET package is typically the following: Say we have a point pattern, for which we would like to test a hypothesis, as a ppp object of spatstat E.g.

X <- spruces
X
## Marked planar point pattern: 134 points
## marks are numeric, of storage type  'double'
## window: rectangle = [0, 56] x [0, 38] metres

Testing a simple hypothesis, e.g., complete spatial randomness (CSR)

  1. Use the function envelope of spatstat to create nsim simulations under CSR and to calculate the functions you want. Important: use the option savefuns=TRUE and specify the number of simulations nsim. See the help documentation in the spatstat package for possible test functions (if the argument fun is not given, the function Kest is used, i.e. an estimator of the \(K\)-function).

Making 999 simulations of CSR and estimating \(K\)-function for each of them and data (the argument simulate specifies how to perform simulations under CSR):

env <- envelope(X, nsim=1999, savefuns=TRUE,
                simulate=expression(runifpoint(ex=X)),
                verbose=FALSE)
  1. Perform the test
res <- global_envelope_test(env)
  1. Plot the result
plot(res)

Testing a goodness-of-fit of a parametric model (composite hypothesis case)

  1. Fit the model to your data by means of the function ppm or kppm of spatstat. See the help documentation for possible models.

  2. Use the function GET.composite to create nsim simulations from the fitted model, to calculate the functions you want, and to make an adjusted global envelope test.

  3. Plot the result.

Testing simple hypotheses

Testing complete spatial randomness (CSR)

Let us illustrate the CSR for the spruces data set from the R library spatstat.

X <- unmark(spruces)
par(mfrow = c(1,1), mgp = c(0, 0, 0), mar = c(0, 0, 0, 0))
plot(X, main = "")

Below the function of the spatstat package is used to generate point patterns under CSR (specified in the argument simulate) and to calculate the centred \(L\)-functions (specified below by the arguments fun, correction and transform), which are used here as the test functions.

nsim <- 1999 # Number of simulations
env <- envelope(X, fun = "Lest", nsim = nsim,
  savefuns = TRUE, # save the functions
  correction = "translate", # edge correction for L
  transform = expression(.-r), # centering
  simulate = expression(runifpoint(ex = X)), # Simulate CSR
  verbose = FALSE)
res <- global_envelope_test(env, type = "erl")
plot(res)

It is possible to cut the functions to an interval of distances \([r_{\min}, r_{\max}]\) (at the same time creating a curve_set from env) and perform the test on the functions on this interval only:

cset <- crop_curves(env, r_min = 1, r_max = 7)
# Do the rank envelope test (erl)
res <- global_envelope_test(cset, type = "erl")
plot(res) + ylab(expression(italic(hat(L)(r)-r)))

Testing random labeling of marks

Let now the studied marked point pattern be the spruces data with marks:

mpp <- spruces
par(mfrow=c(1,1), mgp=c(0, 0, 0), mar=c(0, 0, 0, 0))
plot(mpp, main = "")

As in the CSR test, to perform the test of random labelling hypothesis, first the envelope function of spatstat can be used to generate simulations and calculate the test function \(T(r)\) for the data pattern (mpp) and each simulation. Below the estimator of the mark-weighted \(L\)-function, \(L_{mm}(r)\), with translational edge correction is used as the test function. The argument simulate specifies the simulations under the random labeling, i.e., simple permutation of the marks.

nsim <- 1999 # Number of simulations
env <- envelope(mpp, fun = Kmark, nsim = nsim, f = function(m1, m2) { m1*m2 },
  correction = "translate", returnL = TRUE,
  simulate = expression(rlabel(mpp, permute = TRUE)), # Permute the marks
  savefuns = TRUE, # Save the functions
  verbose = FALSE)

Thereafter, the curves can be cropped to the desired \(r\)-interval and centered by the mean of the simulated functions for better visualization, before making the test.

# Crop curves to desired r-interval
curve_set <- crop_curves(env, r_min = 1.5, r_max = 9.5)
# Center the functions, i.e. take \hat{L}_mm(r)-the mean of simulated functions.
curve_set <- residual(curve_set)
# The global envelope test
res <- global_envelope_test(curve_set)
plot(res) + ylab(expression(italic(L[mm](r)-L(r))))

A combined global envelope test

Sometimes it may be desired to base the test on several test functions Mrkvička et al. (2017). Below it is illustrated how the CSR can be tested simultaneously by means of \(L\), \(F\), \(G\) and \(J\) functions for the saplings data set available in the GET library. First some setup:

data(saplings)
X <- as.ppp(saplings, W = square(75))

nsim <- 499 # Number of simulations

# Specify distances for different test functions
n <- 500 # the number of r-values
rmin <- 0; rmax <- 20; rstep <- (rmax-rmin)/n
rminJ <- 0; rmaxJ <- 8; rstepJ <- (rmaxJ-rminJ)/n
r <- seq(0, rmax, by = rstep)    # r-distances for Lest
rJ <- seq(0, rmaxJ, by = rstepJ) # r-distances for Fest, Gest, Jest

Then perform simulations of CSR and calculate the \(L\)-functions saving the simulated patterns and functions:

set.seed(123)
env_L <- envelope(X, nsim = nsim,
  simulate = expression(runifpoint(ex = X)),
  fun = "Lest", correction = "translate",
  transform = expression(.-r), # Take the L(r)-r function instead of L(r)
  r = r,                       # Specify the distance vector
  savefuns = TRUE,             # Save the estimated functions
  savepatterns = TRUE,         # Save the simulated patterns
  verbose = FALSE)
  # The simulations can be obtained from the returned object:
  simulations <- attr(env_L, "simpatterns")

And then the other test functions \(F\), \(G\), \(J\) should be calculated for each simulated pattern:

env_F <- envelope(X, nsim = nsim,
  simulate = simulations,
  fun = "Fest", correction = "Kaplan", r = rJ,
  savefuns = TRUE, verbose = FALSE)
env_G <- envelope(X, nsim = nsim,
  simulate = simulations,
  fun = "Gest", correction = "km", r = rJ,
  savefuns = TRUE, verbose = FALSE)
env_J <- envelope(X, nsim = nsim,
  simulate = simulations,
  fun = "Jest", correction = "none", r = rJ,
  savefuns = TRUE, verbose = FALSE)

All the curves can then be cropped to the desired r-interval \(I\), if needed,

curve_set_L <- crop_curves(env_L, r_min = rmin, r_max = rmax)
curve_set_F <- crop_curves(env_F, r_min = rminJ, r_max = rmaxJ)
curve_set_G <- crop_curves(env_G, r_min = rminJ, r_max = rmaxJ)
curve_set_J <- crop_curves(env_J, r_min = rminJ, r_max = rmaxJ)

and finally the combined global envelope calculated and plotted

res_combined <- global_envelope_test(curve_sets = list(curve_set_L, curve_set_F,
                                          curve_set_G, curve_set_J))
plot(res_combined, labels = c("L(r)-r", "F(r)", "G(r)", "J(r)"))

A one-stage goodness-of-fit test (typically conservative!)

It is possible to perform a one-stage goodness-of-fit test for point process models as follows, accepting that the test may be conservative (or liberal). In literature, it has been recommended that the tests may be used if the test function is not closely related to the estimation procedure that was used to fit the model. However, GET.composite can be used for adjusted tests, see the help file of this function and an example below.

X <- unmark(spruces)
# Minimum distance between points in the pattern
min(nndist(X))
## [1] 1.044031
# Fit a model
fittedmodel <- ppm(X, interaction = Hardcore(hc = 1)) # Hardcore process

Simulating Gibbs process by envelope is slow, because it uses an MCMC algorithm

#env <- envelope(fittedmodel, fun = "Jest", nsim = 999, savefuns = TRUE,
#                correction = "none", r = seq(0, 4, length = 500))

Using direct algorihm can be faster, because the perfect simulation is used here. Therefore, in the following we utilize the function rHardcore:

simulations <- NULL
nsim <- 999 # Number of simulations
for(j in 1:nsim) {
   simulations[[j]] <- rHardcore(beta = exp(fittedmodel$coef[1]),
                         R = fittedmodel$interaction$par$hc,
                         W = X$window)
}
env_HC <- envelope(X, simulate = simulations, fun = "Jest",
  nsim = length(simulations),
  savefuns = TRUE, correction = "none",
  r = seq(0, 4, length = 500),
  verbose = FALSE)
curve_set <- crop_curves(env_HC, r_min = 1, r_max = 3.5)
res_HC <- global_envelope_test(curve_set, type = "erl")
plot(res_HC) + ylab(expression(italic(J(r))))

Note: Conditioning the Gibbs hard-core model on the number of points, the only parameter in the hard-core model is the hard-core distance. It is possible to fix the hard-core distance, e.g. to the minimum distance between two points in the data, and then the test of the hard-core model with fixed hard-core distance is simple (no parameters involved), and thus exact. This example is given in Myllymäki et al (2017) and it is possible to prepare simulations for this case utilizing the function rmh of the spatstat package. In the above example conditioning on the number of points was not employed.

Adjusted global envelope test for composite hypotheses

Let us test the fit of a Matern cluster process for the sapling data as an example of a composite hypothesis test. The adjusted test of the GET package is described in Section 2.3 of Myllymäki and Mrkvička (2020). The procedure was suggested by Baddeley et al (2017) and extended for global envelopes in Myllymäki and Mrkvička (2020).

data(saplings)
saplings <- as.ppp(saplings, W = square(75))
par(mfrow = c(1,1), mgp = c(0, 0, 0), mar = c(0, 0, 0, 0))
plot(saplings, main = "")

First define the \(r\)-distances and number of simulations. Here we set the number of simulations just to 19, for fast exploration of the code, but for serious analysis we recommend at least 499 simulations.

rmin <- 0.3; rmax <- 10; rstep <- (rmax-rmin)/500
r <- seq(0, rmax, by = rstep)
nsim <- 19 # Increase nsim for serious analysis!

The Matern cluster process can be fitted to the pattern using the ppm function of spatstat. This utilizes minimum contrast estimation with the \(K\)-function. Then the adjusted global area rank envelope test can be performed using the function GET.composite. Below we use the centred \(L(r)\) function as the test function. The argument type specifies the global envelope test, see the help file of the global_envelope_test function.

M1 <- kppm(saplings~1, clusters = "MatClust", statistic = "K")
adjenvL <- GET.composite(X = M1, nsim = nsim,
  testfuns = list(L = list(fun="Lest", correction = "translate",
       transform = expression(.-r), r = r)), # passed to envelope
  type = "area", r_min = rmin, r_max = rmax, verbose = FALSE)
## X is a fitted model object of spatstat;
##  using spatstat to simulate the model and calculate the test functions.
plot(adjenvL)