Spatstat FAQ (draft)

  1. What is spatstat ?
    spatstat is an R library for the statistical analysis of spatial data, mainly spatial point patterns.

    It is one of the largest contributed packages available for R, with about 300 user-level functions and a 500-page manual.

  2. What kinds of data can spatstat handle?
    Mainly, spatial point patterns in two-dimensional space.

    Very complicated datasets can be handled. The point patterns may be `marked' by real numbers (e.g. trees annotated with their diameters), categorical values (e.g. ants labelled by species), logical values (e.g. on/off), etc. The spatial region where the points are observed can have a very complicated shape (an arbitrary polygon or a binary pixel image mask). The point pattern data can be accompanied by other kinds of covariate data, such as a line segment pattern (e.g. map of geological faults) or a pixel image (e.g. map of terrain elevation). Patterns of many thousands of points can be analysed.

    Currently, spatstat deals only with two-dimensional space, and does not handle 3D or space-time data. (This will change in version 2).

    Ultimately, spatstat will handle all the major kinds of spatial data: point patterns, regional data, and geostatistical data. Currently, the vast majority of the functions deal with spatial point patterns. (This is unlikely to change in the near future).

  3. What kind of analysis can spatstat perform?
    spatstat is designed to support a complete statistical analysis of a spatial point pattern dataset. It contains functions for
    • data handling
    • exploratory data analysis
    • model-fitting
    • simulation
    • spatial sampling
    • model diagnostics
    • formal inference
  4. What kind of model-fitting does it do?
    spatstat can fit Poisson point process models, Gibbs point process models and random cluster process models to a point pattern dataset. The models can be spatially homogeneous, or inhomogeneous, with the spatial trend modelled as a function of the cartesian coordinates, and/or a function of spatial covariates. Gibbs models may include interpoint interaction (clustering or repulsion) and dependence on marks.

    Gibbs point process models are fitted by the method of maximum pseudolikelihood or by the Ogata-Huang approximation to maximum likelihood. The user interface is a function ppm similar to the R functions lm or glm, which uses a formula to describe the spatial inhomogeneity and the dependence on covariates or marks. Fitted Gibbs models can be simulated automatically.

    Cluster process models are fitted by the method of minimum contrast. The implementation is experimental and will change in version 2 of the package.

  5. I want to test whether the point pattern is random. Can spatstat do that?
    Yes, and much more. spatstat provides facilities for formal inference (such as hypothesis tests) and informal inference (such as residual plots).

    If you want to formally test the hypothesis of Complete Spatial Randomness you can do this using the chi-squared test based on quadrat counts (quadrat.test), the Kolmogorov-Smirnov test based on values of a covariate (ks.test.ppm), graphical Monte Carlo tests based on simulation envelopes of the K function (envelope), or the likelihood ratio test for parametric models (anova.ppm). You can also inspect the residuals from the uniform Poisson process model using diagnose.ppm.

    spatstat provides similar facilities for checking many other point process models. The chi-squared test based on quadrat counts is available for any inhomogeneous Poisson process model fitted to data. Monte Carlo tests based on simulation envelopes are available for any fitted Gibbs model. Residuals and diagnostics are available for any fitted Gibbs model.

  6. What are the main differences between spatstat and other packages?
    (To be completed!)
  7. What is the practical limit on the number of points in a point pattern?
    Plotting a point pattern ..... over 1 million points
    Exploratory analysis (K function, etc) ..... 100,000 points
    Model-fitting ..... 5,000 points
    Complete analysis ..... over 4,000 points
    We have carried out a complete analysis on an astronomical dataset containing 4300 points.
  8. I want to attach multiple marks to each point e.g. each tree should be marked by its diameter and its species. How to do this?
    Currently you can only attach a single mark variable (e.g. diameter) to each point; spatstat does not support multiple marks. This capability will be added in version 2. Currently the only package which supports analysis of such data is the MarkedPointProcess package.
  9. When I plot the estimated K-function of my data using the command plot(Kest(X)), I don't understand the meaning of the different coloured curves.
    The different curves are different estimates of the K-function (computed by different edge correction techniques) together with the theoretical K-function for a completely random pattern. For more detailed information, read this explanation
  10. I can't seem to control the range of r values in plot(Kest(X)). How can I control it? How is the default plotting range determined?
    To control the range of r values, use the argument xlim as in plot(Kest(X), xlim=c(0, 7)). See help(plot.fv).

    The default range of r values that is plotted depends on the `default plotting range' of the object (of class 'fv') returned by Kest.

  11. How are the r values determined in Kest(X) ?

    The default r values for Kest are computed as follows:

    1. The maximum r value is computed by the function 'rmax.rule', rmax <- rmax.rule("K", W, lambda) where W is the window containing the data, and lambda is the average density of points per unit area. Currently this rule takes the minimum of
      • Ripley's rule of thumb: rmax = one quarter of the smallest side of the enclosing rectangle
      • large sample rule: rmax = sqrt(1000/(pi * lambda))
    2. r values are equally spaced from 0 to rmax with step value 'eps'. If eps is not specified, then eps = rmax/512 so that there are 513 values or 512 intervals.
    You can always override the 'r' values if you need to.

    I should perhaps also point out that when you plot the K function, the range of r values that is plotted depends on the `default plotting range' of the object (of class 'fv') returned by Kest. To override this, add the argument `xlim' to the plot command.

  12. What determines the pixel dimensions (number of pixels) in an image object? How do I control the pixel dimensions when the image is (a) generated by density.ppp() or setcov() (b) created by converting other data using as.im() (c) returned by predict.ppm
    When spatstat is first loaded, the default pixel dimensions are 100 x 100 for all of the above commands except predict.ppm, which has a default of 40 x 40. You can reset the default pixel dimensions by the command spatstat.options(npixel=c(nx, ny)) where nx, ny are the number of pixels in the x and y directions respectively. This does not apply to predict.ppm. Each of the commands (a)-(c) has an argument that controls the pixel dimensions in that particular case. (a) for density(X) where X is a point pattern density(X, dimyx=c(ny, nx)) (b) for as.im(f, W) where f is a number or function and W is a window M <- as.mask(W, dimyx=c(ny,nx)) as.im(X, M) (c) for predict(obj) where obj is a fitted model (class "ppm") predict(obj, ngrid=c(nx, ny)) The creation of new pixel grids is done by as.mask(). See help(as.mask) for explanation of the arguments dimyx = pixel dimensions = c(ny, nx) xy = pixel grid coordinates = list(x, y)
  13. I have several polygonal windows that represent adjoining regions (e.g. counties with some common borders). How can I combine them into a single window?
    Currently this is not possible inside spatstat. You need to use another polygon-handling package to determine which edges are part of the exterior border.