Abstract
The presentation will report progress on a three year ESRC project to write a software package that will add statistical analysis capability to the ARC/INFO GIS. Within the GIS literature the term spatial analysis embraces three types of analysis: map based analysis, modelling and spatial statistical analysis. It is in the third category, spatial statistical analysis (SSA) that GIS are currently weak. If geographical data sets are to be analysed rigorously a range of techniques are needed that require numerical, graphical and cartographical capabilities. The challenge is to provide these in an environment where the user can interact quickly with the data and where these capabilities, operate in different but linked windows. Part of the challenge is to draw wherever possible on existing software. The reason for the linkage with GIS is because GIS provides good mapping and spatial database management systems which are essential for a SSA package and because adding SSA capability to GIS will help GIS to realise its potential as a general purpose tool for handling spatial data.
Spatial statistics comprises methods that allow the analyst to detect spatial properties and patterns in spatial data (eg trend, autocorrelation, concentration and clustering, spatial outliers etc) perhaps through exploratory methods. Spatial statistics also comprises methods for fitting classical models such as regression models in ways that recognize the special problems that are often raised by spatial data (eg non independence) and for fitting non classical models because when events unfold in geographic space, novel specifications may be required.
Work on developing software for SSA has primarily used one of three approaches: purpose written software (eg INFOMAP, REGARD, MANET, GAM, SpaceStat); loose coupling of existing packages (for example where the various packages simply exchange files as in the case of linking ARC/INFO and GLIM or ARC/INFO and SpaceStat); close coupling of existing packages (one package calls routines from another as in the case of XLisp-Stat).
Each of these approaches has the capability to deliver some of the desired features of a spatial analytical system, but not all of them. (For example, using free standing software makes it possible to provide excellent visualisation facilities, but means that all the data has to be imported into the system, and features such as mapping or statistical calculations have to be written from scratch.) What is needed is a system which can take advantage of the excellent facilities which already exist in GIS and statistical packages, adds additional necessary functionality and provides these in an environment which allows for dynamic visualisation of the results and within an architecture of "seamless" integration.
The client server model is one route by which such seamless integration might be achieved. This route has not, we believe, been tried before in this context. A prototype version called SAGE (Spatial Analysis in a GIS Environment) has been developed using just such a client-server architecture with linkage implemented through Remote Procedure Calls . ARC/INFO acts as a server, while a program, which consists of a number of visual and non-visual spatial statistical analysis functions complementing those in ARC/INFO, acts as a client. In terms of functional components the architecture consists of a user interface, a system interface and an operational module (Figure 1).
Figure 1: Architecture for integrating ARC/INFO and SSA
(Rows for system components; Columns for functional components)
The figure below shows an example of an exploratory data analysis session in which various windows have been opened. There is linkage between the windows with objects (regions) in the map window highlighted with objects (symbols) in the graphics and spreadsheet table windows. In addition to exploratory and confirmatory spatial statistical techniques (including various forms of regression models and capacity to define different forms of connectivity matrices) there is a region building module.
Figure 2: Example of an exploratory data analysis session