Abstract
The paper derives from a project1 (part of the ESRC-sponsored Census Initiative) which set itself both methodological and substantive challenges:
The solution devised here centres on creating "synthetic data" which provides the basis for phase two of the method by using as input the initial, phase one, analyses. Each of these analyses produces a classification of all parts of the country (viz. the 10,529 wards (sectors in Scotland) in the 1991 Census). Such a classification identifies which of these 'building block' areas are grouped together as a single region in this set of commuting or migration regions (or whatever the classification represents). Thus the key information in each classification can be re-expressed as a binary matrix of 10,529*10,529 cells (although the matrix is inherently symmetrical, so only half of it is needed).
The crucial benefit from re-expressing each separate classification in this binary form is that these matrices can then be cumulated to produce the synthetic data needed. In GIS terms, it is analogous to layering the sets of boundaries on top of each other and counting the number of layers in which there is no boundary between each pair of areas. It can be seen that this approach provides an assessment of the 'strength of evidence' that two areas should be grouped together. The final synthetic dataset is, then, an ideal basis for the second phase of the definitional procedure - and it can be analysed with a version of ERA which has optimised for this purpose. Other forms of analysis of the synthetic dataset have also been examined.
The methodological innovation of creating synthetic data removes the technical limitations which arise from relying upon a single analysis of a single dataset. In particular, the huge benefit of the synthetic data method is the ability to draw upon analyses of different datasets. Virtually all previous regionalisations have centred on the analysis of a single dataset of flows between areas (most usually commuting flows, but sometimes migration). The synthetic data, however, can draw upon the evidence of many different sets of flows. The synthetic dataset has also been enriched by taking as a further form of input a range of existing sets of boundaries - such as local authority areas - because these are also indicative of which areas might be better kept together and which kept separate.
The paper will illustrate the value of GIS in compiling the synthetic data, for the 10,529 areas, from a large number of boundary sets which were originally defined in terms of many different sets of 'building block' areas. A more fundamental value of GIS here is the near certainty that such a method would not only have been scarcely practicable prior to the diffusion of GIS techniques, it would not have been conceived of without the GIS-based experience of overlaying one boundary set on top of another. Thus it is GIS which has helped to stimulate this innovative methodology, based on visualising localities as areas cut through by relatively few of the many existing and definable sets of boundaries which may be relevant to this project's concern to identify localities for use by researchers with a wide range of interests.
References
Coombes, M.G., Green, A.E. and Openshaw, S. 1986. "An efficient algorithm to generate official statistical reporting areas: the case of the 1984 Travel-to-Work Areas", Journal of the Operational Research Society, 37, 943-953
Dahmann, D.C. and Fitzsimmons, J.D. 1995. (eds) "Metropolitan and Nonmetropolitan Areas: new approaches to geographical definition", US Bureau of the Census Working Paper 12, Washington D.C.
Eurostat. 1992. Study on Employment Zones. Eurostat E/LOC/20 Luxembourg.