Stephen Wise, Robert Haining and Paola Signoretta
Department of Geography and Sheffield Centre for Geographic Information
and Spatial Analysis,
University of Sheffield, United Kingdom.
Email: s.wise@shef.ac.uk
A consistent theme in recent work on developing exploratory spatial data analysis (ESDA) has been the importance attached to visualization techniques, often justified in two ways
Numerous software packages have been developed which provide visualization facilities to help with the analysis of area data. This poster will use one developed at Sheffield to present:
Figure 1: Screenshot of a SAGE session
Figure 1 illustrates some of the key features of the SAGE system:
SAGE provides a range of graphical and numerical tools for undertaking ESDA. In order to assess the effectiveness of these tools, a conceptual model has been developed which has two elements:
Exploratory Spatial Data Analysis has certain key characteristics:
Spatial data can be modelled as having two components:
Both the spatial and non-spatial elements of spatial data can be considered to have these two components as shown in the table below:
Smooth Rough Non-Spatial Properties of distribution e.g. Outliers in distribution median, interquartile range Spatial Trend Localised clusters of high Spatial autocorrelation values; Spatial outliers
According to Cleveland (1994), statistical graphs are used for two purposes, each of which requires the viewer to undertake one or more of three tasks:
Activity Description Perceptual Tasks required by viewer Table Look Reading off value scanning (relating the case to the axis), Up for an individual interpolating (estimating the value of the case case from the tick marks on the axis) matching (linking the case symbol back to the key) Pattern Identifying trends, detection (recognizing how relationships Perception patterns or between values are coded on the graph e.g. regularities in the distances between symbols relate to whole set of data differences in values of observations) assembly (grouping objects on the graph together e.g. all cases relating to a given year) estimation (of the differences between the grouped cases e.g. Year 1 values tend to be greater than those for Year 2).
Good graphical displays can be defined as those that are 'easy to read' i.e. their design assists the viewer in undertaking the necessary perceptual tasks.
The model can be extended to maps and can therefore be used to assess the quality of the visualization tools provided in ESDA software.
A full assessment of the visualization tools in SAGE is contained in Haining et al (1998b). The figures here illustrate some of the key features of the system with comments on their strengths and weaknesses. The data used relate to the uptake of the breast cancer screening service in Sheffield. Enumeration district level data (there are 1159 EDs in Sheffield) have been aggregated into approximately 300 areas so that the illustrations can be seen in the prints here. The grouping (implemented in SAGE) was done on the basis of grouping EDs according to similarity of Townsend deprivation score whilst also trying to create areas of similar population size and with a secondary requirement of areal compactness (for details see Wise et al 1997).
Figure 2: Screenshot to illustrate some features of SAGE which facilitate
the exploration of data
Figure 2 shows an example of some of the features of SAGE which facilitate exploration of the basic properties of the data. The boxplot shows the distribution of uptake rates. It has been used for a table look up operation, namely to determine the value of the lowest rate - this tasks can be assisted in three ways:
The linked windows facility makes it easy to see where in Sheffield this outlier is located, and gives a second method of determining the uptake rate, by highlighting the row in the table.
Figure 3: Illustration of linked windows using map and box plot
The breast cancer screening service is provided in a single location in Sheffield (near one of the major hospitals). It is therefore of interest to see whether distance from this centre affects the proportion of women who use the service i.e. is there a strong SMOOTH element in the spatial pattern of uptake rates. The graph on the right of Figure 3 shows a series of boxplots of the uptake rate, calculated for zones lying at increasing lag distances from the zone containing the centre. The zones at lag three have been highlighted (by selecting the entire boxplot in the right hand window) showing that lag is a reasonable proxy for distance from the screening centre, at least up to lag 3.The graph shows that, perhaps suprisingly, distance from the screening centre does not appear to be a strong factor in determining whether women use the service.
Figure 4: Illustration of linked windows using map and moran plot
An alternative possibility is that women are influenced by the social and economic conditions in their neighbourhood. One way to assess this is to look for spatial clustering.
The graph in figure 4 is a Moran plot in which values for a region are plotted on the Y axis, and average values in neighbouring regions on the X axis. The presence of a positive trend in this graph is evidence of positive spatial autocorrelation - another form of SPATIAL SMOOTH pattern in the data. However, there are also some regions which are outliers from this positive relationship, and these are spatial outliers. Six regions have been selected on the graph (the six at the bottom of the graph) in which the uptake rate is lower than in neighbouring areas. However as the map shows these outliers are scattered across the city.
ESRC grant number R000234470 "Developing spatial statistical software for the analysis of area based health data linked to a GIS" enabled the development of SAGE; A grant from the Joint Information Services Committee (JISC) and the ESRC which made possible the visualization assessment. Thanks to Jingsheng Ma for the development of SAGE and to Dawn Thompson for the use of the breast cancer screening uptake data.
Cleveland W.S. (1994) The elements of graphing data. AT&T Bell Laboratories, Murray Hill NJ.
Haining, R.P., Wise, S.M and Ma, J. (1998a) Exploratory spatial data analysis in a Geographic Information System Environment. The Statistician (in press).
Haining R.P., Wise S.M. and Signoretta P. (1998b) Providing scientific visualization for spatial data analysis: criteria and an assessment of SAGE. Paper presented at the 38th Congress of the European Regional Science Association, Vienna, Aug 28th-Sept 1st 1998.
Wise, S.M, R.P. Haining and J.Ma (1997) "Regionalisation tools for the exploratory spatial analysis of health data". In M.Fischer and A.Getis (Eds) Recent Developments in Spatial Analysis: Spatial statistics, behavioural modelling and neuro-computing. Berlin, Springer-Verlag p83-100.
For further details on SAGE see: http://www.shef.ac.uk/~scgisa