Stuart Barr and Mike Barnsley
Department of Geography, University of Wales Swansea, Singleton
Park, Swansea, SA2 8PP, U.K.
Email: sbarr@swan.ac.uk / mbarnsle@swan.ac.uk
Many of the recent advances in the computational recognition and analysis of spatial patterns present in geographically referenced digital data-sets have utilised approaches from the broad field of pattern recognition (Openshaw, 1994). In particular, statistical and neural pattern recognition paradigms (Schalkoff, 1992) have been used to improve the existing spatial analytical inferential capabilities of many Geographical Information Systems. For example, Openshaw (1994) outlines several statistical and neural approaches to recognition and analysis of a variety of well known spatial concepts often found to be present in geographical digital data-sets. These included, approaches for the recognition and analysis of spatial autocorrelation, the recognition of distance decay patterns and the recognition of spatial association patterns (Openshaw, 1994). In fact, statistical and neural pattern recognition approaches have proved to be such powerful tools for the recognition and analysis of spatial patterns in geographical digital data-sets, that they are now routinely utilised in many geographically-based spatial inferential classification problems, for example, the spatial clustering and spatial classification of geographically referenced digital census information (Openshaw et al, 1995).
However, although many geographical spatial classification problems have been successfully addressed using techniques and approaches from the fields of statistical and neural pattern recognition, they may not be the most appropriate means of spatially classifying all forms of geographical digital data-sets. In particular, statistical and neural pattern recognition approaches may be inappropriate where the geographical data-set under investigation exhibit one or both of the following characteristics. Firstly, they may be inappropriate where the spatial patterns of interest are represented only implicitly in the form of a digital map data-set, for which no spatial or aspatial geographically referenced attribute information is available other than perhaps the nominal or ordinal labelling of the principal map entities present; in situations such as this, the spatial attributes which are to be used in the spatial classification process must be directly derived as result of processing the digital map data-set. Secondly, they may also be inappropriate in situations where the spatial patterns of interest are expressed primarily in a binary relational manner rather than in the form of a quantitative measurement or value; in these cases, statistical and neural approaches may be inappropriate, because they often require the spatial attributes which are to be used in the spatial classification process to be in quantitative feature vector form (Schalkoff, 1992).
There does exist, however, a third general field of pattern recognition, which has received very little attention from the GIS community, which may be appropriate for the development of spatial classification procedures for the type geographical digital data-sets describe above, namely, that of graphical structural pattern recognition. In general, the field of graphical structural pattern recognition is concerned with the development of computational approaches which can derive, represent and analyse both the intrinsic structure of a series of entities or pattern primitives, as well the extrinsic relationships which exit between them. Often, this process of deriving, representing and analysing the structural properties and relationships of pattern primitives, is performed in order to facilitate their quantitative description or classification (Schalkoff, 1992; Gonzales and Woods, 1993). One important feature of graphical structural pattern recognition, is that graph-theoretic approaches and concepts are used to represent the properties and relations of the pattern primitives. In particular, many graphical structural pattern recognition approaches utilise the concept of an attribute graph, which extends the relational graph concept in such a way that they are able to represent both multiple sets of intrinsic properties and multiple sets of extrinsic relations (Ballard and Brown, 1982, Schalkoff, 1989; Schalkoff, 1992). This extension of relational graphs, often allows very complex graph-based descriptions on structure exhibited by a series of pattern primitives to be derived and represented.
From this description of graphical structural pattern recognition, there would seem exist a number of conceptual parallels between this field of pattern recognition and that of using GIS to perform spatial classification tasks. For example, both involve the representation and description of a set of fundamental entities or pattern primitives. In the GIS context, this is often the geographically referenced points, arcs and polygons that may be present a digital data-set. Similarly, both involve the derivation or representation of a series of pattern features which describe the pattern primitives under investigation. Again, in the GIS context, these pattern features are often either existing explicit geographically referenced spatial attributes or attributes which have been derived directly as a result of processing the pattern primitives present. Finally, both are concerned with the performing a classification which assigns the pattern primitives present, on the basis of the their derived pattern features, to one of set of candidate pattern classes; often, this classification process involves some form of explicit spatial analytical functionality.
In this paper, graphical structural pattern recognition approaches are assessed in relation to their suitability for the development of spatial classification approaches for the type geographical digital data-sets described above. In particular, an attempt is made to perform the quantitative structural separation of a series of urban land use pattern classes (i.e., pattern classes relating to different residential land use patterns and industrial/commercial activities) recognised to be present in a urban land cover digital map data-set. The graphical structural separation of the land use pattern classes is attempted using only structural features (i.e., morphological properties and spatial relations) which have been derived directly as result of processing the urban land cover digital map data-set under investigation. In order to perform this task, a graphical structural pattern recognition system called SAMS (Structural Analysis and Mapping System) is used to derive, represent and analyse the structural properties and spatial relations that are present in the urban land cover digital map data-set.
The SAMS structural analysis and mapping system used in this study was originally developed for the structural analysis of satellite remotely-sensed images, studies of which can be found in Barr (1992) and Barr and Barnsley (1995). Theoretical and conceptual issues relating to the development of the SAMS are describe in full in Barr and Barnsley (1997) and Barr and Barnsley (1998). However, in order to provide a context for the issues raised latter in this paper, a brief description of this system is provided below.
SAMS is a region-based structural analysis system which operates on raster digital map data-sets measured on either an ordinal or nominal (categorical) scale. Upon presentation of a raster digital map data-set, SAMS processes this to derive each unique region present, where a region is defined as the complete set of contiguous (topologically touching) raster elements (pixels) which have the same ordinal or nominal label (value). The derived regions form the pattern primitives of interest within SAMS and are used as the basis of all subsequent processing. Each region present in a raster digital map data-set is recognised by means of a simple contour-tracing algorithm (Gonzalez and Wintz, 1987; Gonzalez and Woods, 1993), represented in terms of its boundary using Freeman chain-codes (Freeman, 1975), and stored in the form of a Region Search Map (RSM; Barr and Barnsley, 1997).
Once a boundary description of each region has been derived, SAMS further process this geometric information to derive a series of user specified structural pattern features. The pattern features derived for each region pattern primitive may take a wide variety of forms and measurement scales, although in general, they tend to describe the morphology of regions (e.g., their area, perimeter, compactness etc) as well as the spatial relationships which exist between them (e.g., adjacency, containment, distance and direction etc). The derived structural features are represented and encoded using an updatable graph-theoretic data model known as XRAG (eXtended Relational Attribute Graph). The XRAG data model, a comprehensive and formal description of which can be found in Barr and Barnsley (1997), represents the structure (i.e., structural features) of a set of regions embedded in a raster digital map data-set in terms of seven logical sets, defined by the heptuple:
where
A schematic representation of the XRAG data model and its associated seven logical sets is shown in Figure 1. In the XRAG model, each unique region in a RSM file is represented as a node. The existence of a relationship between any two regions (nodes) for a given relation (e.g., adjacency or containment) is represented by means of an edge. As such, the node set in combination with a single specific relation is equivalent to a standard relational graph (Barr and Barnsley 1997). Non-relational properties of the regions (e.g., their area and perimeter) are represented as attributes of the nodes, while relational properties (e.g., the distance and cardinal direction (orientation) between any two regions) are represented as attributes of the edges.
In addition to the XRAG data model, a wide range of structural analytical algorithms have been developed within SAMS for the analysis the structural features derived and represented. The actual derivation and representation of structural features in a XRAG is performed using a master compilation program which can derive and encoded over 20 unique structural properties and relations from a raster digital map data-set. The structural analytical algorithms that have been developed to search and analyse the structural features represented in a XRAG perform a variety of structural pattern recognition tasks often by means of utilising graph-theoretic search strategies and graph-theoretic concepts. Finally, a graph visualisation tool, called VGRAPH, has also been developed which can directly interface with any of the structural relations represented in a XRAG and visualise these in terms of a relational graph showing the nodes and edges (relationships) that exist.
In this study, Ordnance Survey 1:1,250 scale Land-Line.93+ digital map data covering a 2Km by 2Km area of part of the town of Orpington in the Borough of Bromley, south-east London, U.K. was used to asses the potential of SAMS to perform the spatial inferential classification of urban land use information. This digital map data-set, originally supplied in a vector format, was topologically-structured to generate a series of polygonal coverages, each of which related to a single land cover type of either, Tarmac (representing all road features), Built (representing all building features) Water (representing small water bodies), Tree (representing areas of extensive woodland, but not individual trees within the urban area ) or Grass (denoting all other areas of "open space"). Each polygon coverage was generated by selecting and deriving the sub-set of Ordnance Survey features classes which related to a particular land cover type. For example, the Built coverage was derived by selecting the lines (arcs) which comprised the Ordnance Survey feature classes of Building Outlines, Building Pecks and General Line Detail. Once each land cover coverage had been derived and polygon topology created, these were then combined to provided a single polygonal land cover coverage. This was subsequently converted from vector to raster format in order to derive a raster digital map suitable for analysis using SAMS. The resulting raster land cover digital map data-set, which had a cell (pixel) size of 1m, is shown in Figure 2.
The land use pattern classes which were structurally analysed in this study, were selected by means of a visual analysis of the digital map data-set of Figure 2. The analysis of these data-set, suggested that this particular urban area was comprised of several different urban land use types, ranging from different residential densities through to a number of subtly different zones of urban utilities (i.e., hospital complexes and schools etc). Three predominant urban residential land use types were recognised to be present, namely, (i) areas of 1930s semi-detached houses, (ii) housing estates of late-1970s/early-1980s development, dominated by small blocks of low-rise flats and three-storey `town houses' and (iii) areas of 1990s residential development comprising small, detached houses. In addition to these three residential urban land use types, a spatially extensive hospital complex was also selected as being characteristic of the utility-based land uses present in this area. The selection of a pattern class for each of these four land use types, was performed by visually selecting a single representative sample area from the land cover digital map data-set. The four sample areas selected, each measuring 250m by 250m, are shown in Figure 3. A visual inspection of each land use area in Figure 3, indicates that discernible differences seem to exist between them, in terms of the size, shape and spatial distribution of land cover types present.
In order to measure quantitatively the degree of structural separability which existed between the four land use pattern classes selected, each sample area was processed using SAMS in order to derive a series of structural features. Although SAMS can be used to derive a wide range of structural features, it was decided to derive a relatively small number of well understood morphological properties and spatial relations. In particular, the land cover regions which comprised each land use pattern class, were processed to derive the morphological properties of area and compactness and also the spatial relations of adjacency, containment and proximity distance. These structural features were represented in a XRAG for each land use pattern class and formed the basis of the subsequent analysis undertaken to measure their degree of structural separability.
A number of studies in the general field of pattern recognition, have suggested that powerful pattern representation and pattern discrimination capabilities may be achieved by combining both structural and statistical pattern recognition approaches (Fu, 1986; Schalkoff, 1992). As such, it was decided for the purposes of this study to use such a hybrid structural/statistical pattern recognition approach in an attempt to quantitatively measure the degree of structural separability which existed between the four land use pattern classes. In order to achieve this, the structural features of each land use pattern class were processed in order to derive a series of statistical descriptive measures which characterised the distribution of values in each structural feature.
Statistical descriptions on the morphological properties of area and compactness and the spatial relational property of proximity distance were easily derived by calculating the mean and variance of the values present in each of these structural features. However, the derivation of statistical descriptions on the spatial relations of adjacency and containment involved a number of further processing stages. This is because, these spatial relations represent only binary relational information (i.e., information on whether a spatial relationship exists between any two given land cover regions). Figure 4, shows a series of graph visualisations which represent the binary relationships which exist for the spatial relation of adjacency in each of the four urban land use pattern classes. Visually at least, there would seem to be clear differences in the spatial patterns exhibited in each of these graphs. However, what is required is some form of quantitative measurement which characterises the spatial patterns visually apparent in each of these graphs. A number of potential quantitative measurements have been suggested as appropriate means of describing the nature of spatial relationships which are represented in the form of a relational graph (Kruger, 1979a; Kruger, 1979b). One of the simplest of these measurements, is that of node degree, which expresses the number of edges (relationships) which are incident (occur) for any node in a graph. In this study, node degree information was calculated for each spatial relation. This information, was subsequently processed to calculate the mean node degree and and node degree variance of each spatial relation for each land use pattern class. The statistical description of each structural feature (i.e., those which pertained to both morphological properties and spatial relations) was derived in terms of both an aggregation of all the land cover types present in each land use pattern class and also on a cover-type by cover-type basis. This processing resulted in a total of 144 separate statistical measurements which characterised the structural composition of the selected land use pattern classes (36 for pattern class).
Further analysis of the derived statistical descriptions, revealed that only small sub-set of these measurements were likely to aid the quantitative structural separation of the land use pattern classes. Many structural features had to be discard, because it was not possible to calculate either their mean or variance. For example, the number Built nodes in each land use pattern class was found to be considerably different. However, due to the selection of only a single sample area for each land use, it was not possible to calculate the mean and variance of this structural feature. Further structural features had to be disregard from the separability analysis, due to their mean and variance values being obtained from a prohibitively small number of samples. In these situations, the resulting mean and variance values could not be considered as reliable statistical measurements. For example, the mean and variance values derived for the node degree of containment in terms of the Grass land cover type, was found to be considerably different for each of the land use pattern classes. However, these values were derived from between only two and five samples in the case of any single land use pattern class. Clearly, while the resulting statistical values may well indicate a strong degree of structural separability, they cannot be considered as reliable measurements, due to the small number of samples they were derived from. As a result of this analysis, a final sub-set of the statistical measurements derived was selected to form the basis of the structural separability analysis. This sub-set, consisted of the statistical measurements relating to the structural features of (i) the area of the Built regions in each class, (ii) the compactness of the Built regions in each class, and (iii) the nearest node distance between Built regions in each class. Table 1, presents summary statistical information on each of these structural features for each land use pattern class.
Feature | 1930s | 1980s | 1990s | Hospital |
Built Mean Area | 106.49 | 269.89 | 78.84 | 644.43 |
Built Area Standard Deviation | 51.49 | 204.58 | 32.49 | 1707.97 |
Built Mean Compactness | 2.42 | 3.00 | 2.46 | 5.20 |
Built Compactness Standard Deviation | 0.76 | 1.48 | 0.76 | 8.87 |
Built Mean Nearest Node Distance | 15.74 | 25.53 | 13.65 | 31.57 |
Built Nearest Node Distance Standard Deviation | 3.87 | 6.58 | 4.06 | 17.47 |
Transformed divergence analysis was used to measure the degree of structural separability which existed between the land use pattern classes. For any combination of land use pattern classes, the calculated transformed divergence value represents a measure of their dissimilarity. The value obtained by calculating the transformed divergence between any two land use pattern classes, falls in the range of 0 to 100, where a value 0 shows that they a perfectly similar and a value of 100 shows that they are perfectly dissimilar or perfectly separable; a value which lies somewhere between this range can be interpreted as having a behaviour similar to probability of performing a correct classification (Moik, 1980; Mather, 1987). As such, if very high transformed divergence values are obtained for a give pairwise combination of land use pattern classes, this would suggest that they exhibit a large degree of structural separability. Inversely, if the transformed divergence value is very low, this would suggest that they share a large degree of structural similarity.
Overall, the average transformed divergence value between all land use pattern classes was found to be 92.58; a value which suggests, in general, that large differences exist between the structural composition of the land cover types that comprise each of the four land use pattern classes. As such, one may be tempted, on the basis of this very high average transformed divergence value, to concluded that the four land use pattern classes are, in structural sense, separable. However, Table 2 shows that certain combinations of land use pattern classes, result in transformed divergence values which vary considerably from that of the average. In a positive sense, very high transformed divergence values were obtained for symmetric combinations of 1930s/1980s, 1930s/Hospital, 1980s/1990s, 1980s/Hospital and 1990s/Hospital; these values, all of which are greater than 99.00, clearly show that in a statistical sense these combinations of land use pattern classes are almost completely structurally separable on the basis of the morphological properties and the spatial pattern exhibited by the Built regions. However, a very low transformed divergence value (26.93) was derived for the 1930s/1990s combination of land use pattern classes. This value suggests, that these land use pattern classes share a high degree of structural resemblance in terms of the selected structural features used in this study.
Type | 1930s | 1980s | 1990s | Hospital |
1930s | * | 99.17 | 26.93 | 100.00 |
1980s | 99.17 | * | 100.00 | 99.79 |
1990s | 26.93 | 10.00 | * | 100.00 |
Hospital | 100.00 | 99.79 | 100.00 | * |
In order to explain why the transformed divergence value for 1930s/1990s land use pattern classes was so low, a Mann-Whitney U test was performed on the statistical measures derived for the structural features in these two land use pattern classes. This revealed, that only the structural feature of Built area could be considered to be statistically different at a significance level of 0.001. In the case of both the structural features of Built compactness and Built proximity distance, the resulting Mann-Whitney U values were considerably above the critical value required at the 0.001 significance level for them to be considered statistically different (The Mann-Whitney U test considers there to be significant statistical difference between two distributions, when the calculated U value is below the critical value required for any given significance level). As such, it would seem that for these two land use pattern classes the structural features of Built compactness and Built proximity distance prohibit their structural separability in a quantitative manner. A visual analysis of the two land use pattern classes in Figure 2, clearly shows that a number similarities do exist in terms of the shape of the built regions and also in terms of the distances between these. However, it is also evident from Figure 2 and graph visualisations of these land use pattern classes in Figure 3, that these pattern classes, visually at least, exhibit different structural compositions; clearly, alternative structural features to ones selected in this study would need to be found which expressed these differences and thus allowed their structural separation. Mann-Whitney U tests were further performed for all other combinations of land use pattern classes, in order verify the results obtained from the transformed divergence analysis. In all cases, the resulting Mann-Whitney U values were found to be lower than required critical value at the 0.001 significance level. As such, this analysis further strengths the assertion, that all land use pattern class combinations, apart from that of the 1930s/1990s, are quantitatively structurally separable on the basis of the selected structural features of Built area, Built compactness and Built proximity distance.
This paper has demonstrated that graphical structural pattern recognition approaches can potentially be used to perform the spatial classification of geographically referenced digital map data-sets for which no existing spatial attributes are available and for which the implicit spatial patterns present are characterised in a relational manner. In particular, it has been shown that a graphical region-based structural pattern recognition system called SAMS, which utilises a graph-theoretic data model known as XRAG, can be used to derive, represent, analyse and quantitatively express the morphological properties and spatial relations exhibited by land cover regions in digital map data-sets. Furthermore, by combining this structural pattern recognition system with series of approaches more commonly associated with statistical pattern recognition, it has been shown that is possible to obtain a quantitative measurement on the degree of structural separability which exists between a series urban land use pattern classes using the structural information derived. In all but one case, the combination of these approaches allowed the statistical separation of the structural patterns present in the land use pattern classes to be achieved.
However, the results presented should be treated with a degree of caution, since they are based on an analysis of a single sample area for each of the four land use pattern classes recognised. It is possible, for example, that different sample areas of the same nominal land use might exhibit a somewhat different structural composition in terms of their component land cover regions. If the within-class variation (i.e., within a single land use) is greater than the between-class variation (i.e., between different land uses), then it will not be possible to identify and to distinguish urban land use consistently on the basis of these structural measures. Similarly, the relatively marked differences in structural properties and relations observed in this study may be due to the comparatively small number of land-use categories considered here; greater overlap may, in fact, exist if the full range of urban land uses were examined. Each of these issues requires further, detailed study.
Finally, a number of important developmental issues remain to be addressed, particularly in relation to development of automated computational approaches for selection of the most appropriate structural features. In the case of this study, this task was achieved in non-computational manner. Clearly, it would be desirable to develop and implement approaches within SAMS which removed this subjective non-computational component. Finally, further work is still required to develop a complete graphical structural pattern recognition classification system which can perform the semi-automatic or automatic spatial classification of digital map data-sets such the one used in this study. The development of such a tool is currently under investigation.
The authors would like to acknowledge the support of the U.K. Natural Environment Research Council through grant number GR3/10186. The digital map data used in this study are reproduced from the Ordnance Survey 1:1,250 scale Land-Line 93+ data set with the permission of the Controller of Her Majesty's Stationary Office, Crown Copyright. We are grateful to Peter Blamire and Thomas Bauer (Department of Geography, University of Wales Swansea) for their assistance in pre-processing these data.
Ballard, D. H., and Brown, C.M., 1982. Computer Vision. Prentice Hall.
Barnsley, M.J., and Barr, S.L., 1997. Distinguishing urban land-use categories in fine spatial resolution land-cover data using a graph-based structural pattern recognition system. Computers, Environment and Urban Systems, 21(3/4), 209-225.
Barr, S.L., and Barnsley, M.J., 1995. A spatial modeling system to process, analyse and interpret multi-class thematic maps derived from satellite sensor images. In Fisher, P. (ed.), Innovations in GIS 2, Taylor and Francis, 53-65.
Barr, S.L., and Barnsley, M.J., 1997. A region-based, graph-theoretic data model for the inference of second-order thematic information from remotely-sensed images. International Journal of Geographical Information Science, 11, 555-576.
Barr, S.L., and Barnsley, M.J. 1998. A syntactic pattern-recognition paradigm for the derivation of second-order thematic information from remotely-sensed images. In Atkinson, P.M., and Tate, N.J. (eds.), Advances in Remote Sensing and GIS Analysis. John Wiley, (in press).
Freeman, J., 1975. The modeling of spatial relations. Computer Graphics and Image Processing, 4, 156-171.
Fu, K.S., 1986. A step towards unification of syntactic and statistical pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(3), 398-404.
Gonzalez, R.C., and Wintz, P., 1987. Digital Image Processing. Addison-Wesley.
Gonzalez, R.C., and Woods, R.E., 1993. Digital Image Processing. Addison-Wesley.
Kruger, M.J.T., 1979a. An approach to built-form connectivity at an urban scale: system description and its representation. Environment and Planning B, 6, 67-88.
Kruger, M.J.T., 1979b. An approach to built-form connectivity at an urban scale: variations of connectivity and adjacency measures amongst zones and other related topics. Environment and Planning B, 6, 305-320.
Mather, .P.M, 1987. Computer Processing of Remotely Sensed Images. John Wiley.
Moik, J.G., 1980. Digital Processing of Remotely Sensed Images. Scientific and Technical Information Branch, National Aeronautics and Space Administration (NASA).
Openshaw, S., 1994. A concepts-rich approach to spatial analysis, theory generation, and scientific discovery in GIS using massively parallel computing. In Worboys, M.F. (ed.), Innovations in GIS 1. Taylor and Francis, 123-137.
Openshaw, S., Blake, M., and Wymer, C., 1995. Using neurocomputing methods to classify Britain's residential areas. In Fisher, P. (ed.), Innovations in GIS 2. Taylor and Francis, 97-111.
Piff, M., 1992. Discrete Mathematics: An Introduction for Software Engineers. Cambridge University Press.
Ross, K.A., and Wright, C.R.B., 1992. Discrete Mathematics, Prentice Hall.
Schalkoff, R.J., 1989. Digital Image Processing and Computer Vision. John Wiley.
Schalkoff, R.J., 1992. Pattern Recognition: Statistical, Structural and Neural Approaches. John Wiley.
Sonka, M., Hlavac, V., and Boyle, R., 1993. Image Processing, Analysis and Machine Vision. Chapman and Hall.