Methods for an Intelligent Continuous Surface Transformation of Socio-Economic Data for Europe

Stan Openshaw and Tim Perrée
Centre for Computational Geography, School of Geography, University of Leeds, Leeds LS2 9JT, United Kingdom

Background
Areal interpolation is an important computational technique in geography. It is used for transforming data associated with a set of discrete source zones, of a particular type and scale, first to a continuous surface form, which is then integrated to yield discrete estimates for a different set of target zones, which may be smaller or larger than the original source zones. This problem typically arises when it is necessary to merge data collected for different sets of areal entities into a common spatial framework.

Normally areal interpolation is used for small areas which are wholly within one country. Research associated with the EU funded Medalus III project requires the areal interpolation of a range of social, economic, demographic and environmental data across Europe onto a common spatial framework for which there is digital terrain model data, climatic change modelling estimates, and outputs from various hillslope and hydrological models. This presents challenges in several areas, including the choice of a good interpolation method; how to handle the varying quality, completeness and availability of source data; the inclusion of knowledge into the interpolation and aggregation process; and the assessment of errors which may well propagate.

The aim is to develop an innovative GIS data modelling methodology that can be used to link physical and environmental system models, being operated at various scales in several study areas, with regional scale spatial models of the principal socio-economic and demographic systems. This will enable the development of a set of regional indicators which can provide a planning tool for application to potential land degradation and desertification at regional, national and European scales over the coming decades. The product of this work will be a European scale frameless database based on regular gridded intermediate zones that can be aggregated to differing levels of resolution.

Methodology
Many different approaches to solving areal interpolation problems have been developed. A few studies have been concerned with comparing the efficiency and accuracy of different methods; see Fisher & Langford (1995). One of the best simple methods is that of Tobler (1979). He uses a pycnophylactic interpolation method that creates smooth surfaces from a distribution of irregularly shaped geographical regions. The elevation of the surface represents an attribute (e.g. population density). The pycnophylactic (mass preserving) property of this method is achieved by imposing the common sense constraint that the counts reported for target zones (after interpolation) match the counts of the source zones. Tobler's iterative method has the simple objective of minimising the curvature of the derived surface, and assumes no information about local non-homogeneity or anisotropy. As a result, this method has been criticised for its simplicity (Goodchild et al., 1993).

Tobler’s method is currently being used in the Global Demography Project (Tobler et al., 1995; NCGIA, 1996). This is an example of the application of areal interpolation on a global scale. The project is generating consistent, spatially referenced global population data sets for global and regional environmental analysis and demographic research. Global change research has largely neglected the human dimensions of environmental degradation processes. This is partly due to the lack of socio-economic datasets in a suitable format, with a suitable level of detail for large areas. Some of the issues being addressed by this project are also being addressed by our work. These include the problems of integrating heterogeneous databases from different sources, consisting of data sets collected for different time periods at different scales, and producing areal interpolations which can be used for local scale and multinational scale modelling.

In a European context the inherent simplicity of Tobler's method is an apparent strength. The output of this method in the form of a smooth surface can be linked with physical models at a range of scales, from local areas within a single country to large areas of Europe. Auxiliary information about a location can reduce the errors introduced by areal interpolation (Monmonier & Schnell, 1984; Goodchild et al., 1993; Langford et al., 1991; Flowerdew & Green). Some of the criticisms of the pycnophylactic method can be addressed through the use of auxiliary information, which can help provide a more realistic interpolation of attributes such as population distribution. Information such as the location of urban and rural areas, lakes and mountains can be obtained from remote sensing data and digital elevation models and used to add intelligence to the interpolation method. This is not just a matter of identifying the zero population areas. DTM data can be used to add intelligence to the methodology by excluding population from areas where knowledge suggests people do not live; for example, steep slopes, lack of roads, etc. Additionally, other information about the empirical regularities often found in socio-economic data can be used to refine the crude surface interpolations; for example, three dimensional population density models. This approach, combined with statistical accounting constraints, produces sensible, intelligent, local cross- area interpolation.

Interpolation should ideally be accompanied by some measure of reliability. We plan to compare the results of our interpolations with data from other sources (depending on availability). It would also be of interest to use an existing method of modelling errors in areal interpolation (Fisher &Langford, 1995) to provide a more objective assessment. Additional, Monte Carlo simulation will be used to investigate error propagation via the use of High Performance Computing hardware.

References
Fisher, P.F. and Langford.M. 1995. "Modelling the errors in areal interpolation between zonal systems by Monte Carlo simulation", Environment and Planning A, 27, 211-224.

Goodchild,M,F, Anselin,L. and Deichmann, U. 1993. "A framework for the areal interpolation of socioeconomic data", Environment and Planning A, 25, 383-397.

Langford, M., Maguire, D.J. and Unwin, D.J. 1991. "The areal interpolation problem: estimating population using remote sensing in a GIS framework". In Masser, I. and Blakemore, M., eds, Handling Geographical Information: Methodology and Potential Applications. Harlow, Essex: Longman, pp.55-77.

Monmonier, M. and Schnell, G. 1984. "Land use and land cover data and the mapping of population density", International Yearbook of Cartography, 24, 115-121.

NCGIA, 1996. The Global Demography Project, http://ncgia.ncgia.ucsb.edu:80/~uwe/

Tobler, W.A. 1979. "Smooth Pycnophylactic Interpolation for Geographical Regions", J. American Statistical Association, 74, 519-529.

Tobler, W.A, Deichmann, U., Gottsegen, J. and Maloy, K. 1995. The Global Demography Project, NCGIA Technical Report TR-95-6, April 1995.