Unmixing Aggregate Data: Estimating the Social Composition of Enumeration Districts

Richard Mitchell, David Martin and Giles Foody
Department of Geography, University of Southampton, Southampton SO17 1BJ

This paper addresses the problem of interpreting and classifying aggregate data sources, and draws parallels between tasks commonly encountered in image processing and census analysis. Both of these fields already have a range of standard classification tools which are applied in such situations, but these are hindered by the aggregate nature of the input data. An approach to 'unmixing' aggregate data, and thus revealing the nature of the sub-unit variation masked by aggregation, is introduced. The approach has already shown considerable success in Earth Observation applications, and this paper presents the adaptation and application of the approach to Small Area Statistics (SAS) data for Southampton, Hampshire, revealing something of the social composition of Southampton's enumeration districts (EDs). The unmixing technique utilises an artificial neural network (ANN).

In image processing, spectral reflectance from the ground is recorded on a pixel by pixel basis. A frequent task for researchers is to allocate pixels to a land cover class based on the multi-band reflectance information. Difficulties arise, however, when variation in ground cover class occurs at a scale beneath that at which the data is recorded and pixels contain a mixture of land cover classes. Classification of a 'mixed' pixel into one, most likely land cover class thus clearly introduces error. The image processing solution has been to develop the use of classification techniques which allow for multiple and partial class membership. A variety of techniques exist which may accomplish this task, including the use of an ANN. The ANN is trained to fire output nodes in proportion with the presence of a defined land cover class within a pixel. Training the ANN is typically achieved with target values derived through the use of finer spatial scale imagery.

EDs are known to rarely enclose socially homogenous areas and can thus be thought of as analogous to mixed pixels. Applying this technique to SAS data however, requires a number of conceptual and technical adaptations. The ED must be reconceptualised as a unit enclosing a mixture of households from definable social groups. If this conception can be upheld, the SAS may be thought of as the result of carrying out a census on a specific social mixture, and therefore internalising information about that mixture. The household groups are thus thought of as similar to land cover types in the image processing version of the technique. The Sample of Anonymised Records is used to generate a classification through which households are divided into a number of groups. Using the household groups, ED populations of known group composition can be constructed. SAS-like data are then derived form these synthetic ED populations and used to train an ANN to model the proportional presence of each household group. The result is an ANN which, when exposed to real SAS for an ED will model the proportional presence of the pre-defined household groups within it, thus giving a detailed account of its social composition.

Following a brief account of the principles and means by which the unmixing has been carried out, this paper illustrates its successful application to the city of Southampton. Each ED within the city boundary has been unmixed to model the presence of fifteen distinct household groups inside it. Comparisons of an 'unmixed' model of the city's socio-spatial structure with a conventional geodemographic model are presented. The differences in the levels of information that each approach provides are explored and the implications of this new, sub-ED information on Southampton's socio-spatial structure are discussed. Details of the difficult task of assessing the quality of the unmixed model are also offered; assessment is particularly difficult since the model provides information at a level which is otherwise not readily available.

The paper concludes with a consideration of the range of analyses which such a technique might facilitate if developed further.