The role of visualization in the exploratory spatial data analysis of area-based data

Stephen Wise, Robert Haining and Paola Signoretta
Department of Geography and Sheffield Centre for Geographic Information and Spatial Analysis,
University of Sheffield, United Kingdom.

1. Introduction

A consistent theme in recent work on developing exploratory spatial data analysis (ESDA) has been the importance attached to visualization techniques, often justified in two ways

Numerous software packages have been developed which provide visualization facilities to help with the analysis of area data. This poster will use one developed at Sheffield to present:


Figure 1: Screenshot of a SAGE session

Figure 1 illustrates some of the key features of the SAGE system:

SAGE provides a range of graphical and numerical tools for undertaking ESDA. In order to assess the effectiveness of these tools, a conceptual model has been developed which has two elements:

3. Data model for ESDA

Exploratory Spatial Data Analysis has certain key characteristics:

Spatial data can be modelled as having two components:


Both the spatial and non-spatial elements of spatial data can be considered to have these two components as shown in the table below:

             Smooth                           Rough                             

Non-Spatial  Properties of distribution e.g.  Outliers in distribution          
             median, interquartile range                                        

Spatial      Trend                            Localised clusters of high        
             Spatial autocorrelation          values;                           
                                              Spatial outliers                  

4. Conceptual Model of Visualization

According to Cleveland (1994), statistical graphs are used for two purposes, each of which requires the viewer to undertake one or more of three tasks:

Activity      Description          Perceptual Tasks required by viewer          

Table Look    Reading off value    scanning (relating the case to the axis),    
Up            for an individual    interpolating (estimating the value of the   
              case                 case from the tick marks on the axis)        
                                   matching (linking the case symbol back to    
                                   the key)                                     

Pattern       Identifying trends,  detection (recognizing how relationships     
Perception    patterns or          between values are coded on the graph e.g.   
              regularities in the  distances between symbols relate to          
              whole set of data    differences in values of observations)       
                                   assembly (grouping objects on the graph      
                                   together e.g. all cases relating to a given  
                                   estimation (of the differences between the   
                                   grouped cases e.g. Year 1 values tend to be  
                                   greater than those for Year 2).              

Good graphical displays can be defined as those that are 'easy to read' i.e. their design assists the viewer in undertaking the necessary perceptual tasks.

The model can be extended to maps and can therefore be used to assess the quality of the visualization tools provided in ESDA software.

5. Examples of Visualization in SAGE

A full assessment of the visualization tools in SAGE is contained in Haining et al (1998b). The figures here illustrate some of the key features of the system with comments on their strengths and weaknesses. The data used relate to the uptake of the breast cancer screening service in Sheffield. Enumeration district level data (there are 1159 EDs in Sheffield) have been aggregated into approximately 300 areas so that the illustrations can be seen in the prints here. The grouping (implemented in SAGE) was done on the basis of grouping EDs according to similarity of Townsend deprivation score whilst also trying to create areas of similar population size and with a secondary requirement of areal compactness (for details see Wise et al 1997).

Figure 2: Screenshot to illustrate some features of SAGE which facilitate the exploration of data

Figure 2 shows an example of some of the features of SAGE which facilitate exploration of the basic properties of the data. The boxplot shows the distribution of uptake rates. It has been used for a table look up operation, namely to determine the value of the lowest rate - this tasks can be assisted in three ways:

  1. Re-sizing the window (to bring the graph closer to the axis)
  2. Turning on gridlines to assist in relating the point to the axis.
  3. Zooming in on the boxplot window (shown as a separate window on the right)

The linked windows facility makes it easy to see where in Sheffield this outlier is located, and gives a second method of determining the uptake rate, by highlighting the row in the table.

Figure 3: Illustration of linked windows using map and box plot

The breast cancer screening service is provided in a single location in Sheffield (near one of the major hospitals). It is therefore of interest to see whether distance from this centre affects the proportion of women who use the service i.e. is there a strong SMOOTH element in the spatial pattern of uptake rates. The graph on the right of Figure 3 shows a series of boxplots of the uptake rate, calculated for zones lying at increasing lag distances from the zone containing the centre. The zones at lag three have been highlighted (by selecting the entire boxplot in the right hand window) showing that lag is a reasonable proxy for distance from the screening centre, at least up to lag 3.The graph shows that, perhaps suprisingly, distance from the screening centre does not appear to be a strong factor in determining whether women use the service.

Figure 4: Illustration of linked windows using map and moran plot

An alternative possibility is that women are influenced by the social and economic conditions in their neighbourhood. One way to assess this is to look for spatial clustering.

The graph in figure 4 is a Moran plot in which values for a region are plotted on the Y axis, and average values in neighbouring regions on the X axis. The presence of a positive trend in this graph is evidence of positive spatial autocorrelation - another form of SPATIAL SMOOTH pattern in the data. However, there are also some regions which are outliers from this positive relationship, and these are spatial outliers. Six regions have been selected on the graph (the six at the bottom of the graph) in which the uptake rate is lower than in neighbouring areas. However as the map shows these outliers are scattered across the city.

6. An Assessment of SAGE

6.1 Range of tools

6.2 Quality of visualization facilities

7. Conclusions


ESRC grant number R000234470 "Developing spatial statistical software for the analysis of area based health data linked to a GIS" enabled the development of SAGE; A grant from the Joint Information Services Committee (JISC) and the ESRC which made possible the visualization assessment. Thanks to Jingsheng Ma for the development of SAGE and to Dawn Thompson for the use of the breast cancer screening uptake data.


Cleveland W.S. (1994) The elements of graphing data. AT&T Bell Laboratories, Murray Hill NJ.

Haining, R.P., Wise, S.M and Ma, J. (1998a) Exploratory spatial data analysis in a Geographic Information System Environment. The Statistician (in press).

Haining R.P., Wise S.M. and Signoretta P. (1998b) Providing scientific visualization for spatial data analysis: criteria and an assessment of SAGE. Paper presented at the 38th Congress of the European Regional Science Association, Vienna, Aug 28th-Sept 1st 1998.

Wise, S.M, R.P. Haining and J.Ma (1997) "Regionalisation tools for the exploratory spatial analysis of health data". In M.Fischer and A.Getis (Eds) Recent Developments in Spatial Analysis: Spatial statistics, behavioural modelling and neuro-computing. Berlin, Springer-Verlag p83-100.


For further details on SAGE see: