William Mackaness, Alistair Edwardes, and Tim Urwin.
Department of Geography, The University of Edinburgh,
Drummond Street, Edinburgh EH8 9XP, UK.
Email: wam@geo.ed.ac.uk
The aim of the research was to address the problems of rapid dissemination of large volumes of geographic data over the internet. Specifically, ways of reducing the volume of data that is transmitted in response to a request for geographic information. Data volume reduction is required to speed transmission, and to facilitate rapid analysis and display. The research is concerned with the development of self evaluating algorithms for automating this process that adhere to strict quality controls governing the location precision of the data, and completeness of information associated with each location. Such algorithms obviate the need for human intervention which is time consuming and often offers no guarantee of a quality assured result. The research was undertaken in support of the UKBORDERS on-line access system for the 2001 census digital boundary data (jointly supported by ESRC and JISC) and is in collaboration with the Data Library of the University of Edinburgh. Its success is critical to the broad uptake and use of 2001 census digitised boundary data and in the presentation and analysis of information at a range of scales and themes. The research is fundamental to the broader issues of data integration, for example in the creation of national databases such as NLIS and ScotLIS where it will be necessary to integrate data sources of differing scale, resolution and information content.
UKBORDERS is an online data service providing staff and students in UK Higher Education with password-controlled access to digital boundary data (DBD) describing approximately half a million different areas. This national service is jointly supported by ESRC and Joint Information Systems Committee of the higher education funding councils (JISC) and is delivered as part of the national services hosted by the Edinburgh University Data Library (EDINA). UKBORDERS has over 1200 registered users at 112 higher education institutions. The DBD are delivered in a range of formats using standard file transfer protocols over the Internet. The digital boundary data include Enumeration Districts and Output Areas of the 1991 Census, and the Postcode Unit boundaries for Scotland. Using these as 'building blocks', a wide variety of higher boundaries covering census, administrative, electoral and postal areas can be created. The boundary data can be accessed at a range of aggregations. The finest scale for England and Wales is the enumeration unit, and the coarsest is the counties level (Table1). These datasets have been created by the census office for Scotland (GRO) and the EDLINE consortium and are supplied via the UKBORDERS system.
England | Wales | Scotland |
Counties: 46 | Counties: 8 | Regions/island areas: 12 |
Districts: 366 | Districts: 37 | Districts: 56 |
Electoral wards: 8,985 | Electoral wards: 945 | Pseudo-sectors: 1,003 |
Enumeration districts: 106,866 | Enumeration districts: 6,330 | Output areas: 38,255 |
Postcode units: 132,080 |
Table 1. The boundary types available through UKBORDERS. Adapted from Denham (1993, p.58)
The data are selected via a menu driven interface and the requested files are created using ArcInfo. The user must first delimit the geographical extent, defining it in terms of counties, districts or electoral wards. The user must then select the detail of the data used to fill the geographic region, defining it in terms of districts, electoral wards or enumeration districts. Once the user has made their selection, a copy of their data is held in temporary storage at the UKBBORDERS ftp site for a period of a week during which time the user can download the data to their site.
The minimum geographic extent possible for a user to select using UKBORDERS is a single district. However, if multiple districts or counties are selected these are joined together from all the selected individual data sets to form a new single boundary data set which the user can then download from the ftp site. These files can be quite large. For example the file size for district boundaries covering England and Wales is 33 Megabytes and contains 1.3 million vertices.
The complete 'library' of digitised boundaries is approximately 1-2 gigabytes in size. The large size of the library is due, in part, to the fine geographic precision with which the base boundary data are defined. User requests (typically subsets of the complete library) (1) may cover a large geographical extent; (2) be a region of complex boundaries (particularly those containing a convoluted section of coastline or an urban area containing a very large number of smaller polygons); or (3) be joined boundary data set consisting of a large number of shorter lines densely populated with points defining its shape. Such files can (1) cause client software to fail because of memory constraints or software design, (2) take a long time to transmit over the Internet, (3) take a long time to load for screen display and interactive use. There is therefore a need to develop a simplification facility as part of the interactive selection process that reduces the amount of data but without losing its characteristic form or compromising its use in further analysis and integration with census attributes.
Therefore the objective of this research was to develop techniques that reduce the volume of data in order to encourage the widest possible use of data, minimise download time and enable dynamic interaction with very large data sets. This is not always a straightforward task, however, as the area-specific nature of census data (and their specialist use) puts significant constraints on precisely how the data volume can be reduced. For example it is not possible to remove or merge polygons, and extra polygons must not be generated as a by-product of the data reduction process, since loss of polygons will result the loss of linkage between the boundary dataset and the census attribute dataset and additional polygons will mean that the map space is no longer exhausted and the boundaries are spurious.
Previous research has highlighted the hazards of allowing the user to select data reduction thresholds, since the data may already be in a form that cannot be further reduced. Therefore the focus of the research was on the development of a robust self-evaluating algorithms that optimised file size reduction whilst accounting for the varying nature or characteristics of the data. In this way it could improve substantially on default parameters of commercial GIS simplification routines, which simplify the coverage as a whole rather than as different parts. The challenge for any developed simplification routine is to maintain sufficient control over the quality in terms of integrity and completeness of the database whilst reducing the complexity and size of the files so that there is no loss of linkage with the area-specific statistics being mapped. The context of use therefore required the algorithm to be robust and to provide a 'workable' solution in every instance.
Research into issues of map generalisation has taken place over 30 years (Muller 1991); a very large number of approaches have been considered including the use of intelligent map agents (Baeijs et al 1996), and object oriented approaches (Ormsby and Mackaness 1998). The objective behind all these approaches being the derivation of multi-scale products from a single detailed database. A variety of specific algorithms exist to displace objects (Mackaness 1994), to simplify networks (Mackaness and Mackechnie 1998), to simplify polygon data (Bader and Weibel 1997) and to patch generalise regions (Muller and Wang 1992). But there is very little work that examines the analysis required to automatically parameterise these algorithms and to evaluate the quality of the resulting products; quality being assessed in terms of consistency, completeness, relative location and attribute accuracy (Muller 1991). These and other measures (such as map homogeneity and gestalt) are essential to the construction of algorithms whose behaviours are predictable and assessable. In the absence of such assessment mechanisms, it is impossible to create automated environments where algorithms are capable of self-assessment. The result is very labour intensive sessions with a GIS, which 'assists' but does not replace the user (Medyckj-Scott and Hearnshaw 1993). In the absence of analysis and automated evaluation, the assessment of these simplification algorithms takes the form of visual assessment which is tedious and itself error prone (Painho 1995). It is considered that this research makes an important contribution to the development of analysis and evaluation algorithms that obviate the need for user involvement in the way described by Painho (1995). In addition the research makes a contribution to the area of optimisation of automated cartographic solutions which employ procedural and knowledge based techniques to apply multiple generalisation operators (see for example Wang and Muller, 1998).
The basic methodology adopted in the development of the self-evaluating algorithm was viewed as a three-stage process: analysis, synthesis and evaluation. The first phase (analysis) aimed to define the intrinsic and unique characteristics of a given dataset. Statistical properties such as the mean and standard deviation of the number of points per boundary line and the average length of an arc segment of a boundary were measured in order to determine the homogeneity of the dataset. Through empircal analysis of a range of datasets, it was clear that urban regions of a map could be distinguished from rural regions by examining a number of the properties of the boundary of a land parcel. For example urban boundaries typically had an anthropogenic look to them. Generally this appearance could be accounted for by the number of vertices on the line. This information was used to select an appropriate algorithm based on whether the dataset was wholly urban, wholly rural or a mixture of these two categories. The rules employed to perform this test where that if the mean number of vertices per arc was small (15 vertices or less) and the associated standard deviation was likewise small , then the dataset was deemed homogenous. If the average length of an arc in this homogenous dataset was then examined and found to be less than 400 metres then the dataset was also deemed to be wholly urban. If the dataset was found to be mixed and therefore non-homogenous, it was split into its constituent urban and rural parts. This was performed by discriminating between arcs with less than 15 vertices and those with more. These two parts were then passed to the associated rural and urban algorithm routines. The classification values were defined empirically. It was observed that in urban areas boundaries followed anthropogenic features (principally roads) and therefore had significantly less vertices than other more rural areas where boundaries were based on more natural features such as rivers. Differentiating between categories allowed for the selection of a more appropriate parameter to be used, making the process more accurate than if it were to select a single parameter with which to apply to the entire dataset. The simplification routines themselves used further statistical information calculated from the sub-setted datasets to derive a tolerance parameter. Methods to derive an appropriate parameter were a fundamental focus of the research and are outlined in detail later.
The second stage was to create a solution (synthesis) using the information from the analysis phase. The practical concern of the research was to produce an algorithm for data reduction within the constraints of time, software, robustness and data integrity. The constraint of time required the algorithm to operate within a reasonable time-frame expected for Internet transmission. The constraint of software meant the algorithm was implemented using ArcInfo GIS software, since this is the format the data is stored in. The constraint of data meant the algorithm had to be able to cope with exceptional boundary delineations and any inconsistencies or errors in the datasets which were either inherent or created by the process of building higher level boundaries from lower level administrative units. The requirement of robustness in the design of the algorithm meant that a workable solution was required every time the simplification algorithm was called. In some cases, where the data was already in a reduced form, the solution might be to return the unsimplified form of the dataset since simplifying the data would compromise its integrity. For most datasets the solution was to simplify at least one of the component parts of the dataset and recombine them. Whilst the method aimed to provide an acceptable solution through the optimum selection of the tolerance parameter, it was not possible to guarantee that unwanted by products of data reduction had not occurred. In the event of these topological inconsistencies being created by the simplification process a further routine of the algorithm was applied to the dataset to remove the small number of affected lines and replace them with their original form. This process is described later.
The result was evaluated to ensure that the solution fell within stated criteria. If it failed, due to there being less polygons after simplification, the user was informed that the procedure had failed and was supplied with the original data. This evaluation phase included the creation of a meta data file that outlined the changes that had occurred in the dataset in order that the user could evaluate the changes that had been made. An example of such a file is shown in Figure 9. This three phase methodology was built around the Douglas-Peucker algorithm for line simplification (described below). It should be stressed that any number of alternate simplification algorithms could have been use, however the focus of this research was on the analysis and self-evaluation methodology rather than the simplification algorithm itself.
The Douglas-Puecker algorithm (Douglas and Puecker, 1973) operates, in an iterative manner, by successively caricaturing a digitised line and removing all points that lie within a specified tolerance corridor of this representation. Typically the choice of a tolerance value is a largely subjective element of the method meaning that parameterisation of models is typically an intuitive exercise, carried out by visually inspecting the resultant maps and experimenting with varying tolerances. This is an arduous and potentially unreliable task when undertaken manually (Buttenfield, 1991, Zhan and Buttenfield 1996). An essential element of the research was to provide techniques for the selection of this tolerance parameter by automated means, such that the value was relevant to the particular properties of the dataset as opposed to being arbitrarily selected by the user. It was first necessary to determine the effects of simplification on data volume and data resolution. Initially analysis was restricted to the properties of single lines. However, the processing overhead of assigning tolerance values on a line by line basis was too great. The range of line forms was too complex to provide a means of classifying the lines using the statistical information available from standard GIS functions (Buttenfield, 1985). As an alternative approach, statistical analyses were undertaken at a dataset level of data aggregation using a range of datasets representative of typical user queries
A series of experiments were undertaken using 17 test datasets, to determine how changes in the dataset, might be approximated by mathematical functions, and what relationships might exist between these functions and properties of the datasets. In particular the changes in total line length and total number of vertices, with respect to tolerance, where examined since these provided a measure of the change in data quality and change in data volume respectively. The datasets were first analysed individually, using least-squares regression to attempt to define the equations relating change in tolerance to change in length and change in number of vertices. Subsequently the datasets where analysed together to attempt to find relationships between the coefficients of the different approximated relationships and properties of the datasets. This analysis again used least squares regression to identify relationships. Typical profiles for four of the test datasets are shown in Figure 1, showing reduction in the number of vertices as a function of tolerance. Figure 2 shows the relationship between total line length and increasing tolerance. Figure 3 is a two-dimensional plot of a three-dimensional relationship, and plots vertices against total length for each value of tolerance used between 1 and 50 metres. A subset of tolerance values are labeled on the curve. For reasons of clarity only the result for Dorset county wards is shown (the other examples generate similar profiles).
Figure 1 Reduction of vertices as a function of simplification tolerance for a range of datasets.
Figure 2 Reduction in percentage of total length as a function of simplification tolerance for a range of test datasets.
Figure 3. The relationship between total length, number of vertices and simplification tolerances for Dorset county wards.
Figure 1 indicates that the loss in vertices with increasing tolerances proceeds at an exponential rate and that the particular profile this takes is a factor determined by the characteristics of each individual dataset. The general form of this relationship was found to be best approximated by the function y = ax^{-b}. Figure 2 indicates that the loss in total line length proceeds at a rate that is approximately linear to the increase in tolerance, a relationship of the general form y = ax + b. Here again the particular gradient (a) of this function was found to differ with each individual dataset. In effect, Figure 3 illustrates that low tolerances produce high rates of change in vertex numbers and low changes in total line length (tolerance values between 1 and 10 in Figure 3). But, that for higher tolerances increasingly fewer vertices are removed for a rapidly increasing loss in total line length.
Following from these observations, the next stage was to develop techniques that could be employed to select the tolerance of the simplification algorithm. A tolerance value can be selected using two different approaches. The first is to select a tolerance that minimises the loss of quality. The second is to choose a tolerance that optimises volume reduction. Changes in line length are a surrogate indicator for changes in dataset quality and reduction in vertex numbers is a yardstick for file size reduction. Three techniques were investigated to manipulate these controls. The first used the change in total line length and the other two modeled tolerances using changes in vertex numbers. These three approaches are described below.
The analysis of the data for loss in line length with vertices revealed that in most of the test datasets a proportion of data could be removed without any loss in data quality. This observation is illustrated in Figure 4, represented as a short flat 'plateau' section at the top of the curve. The flat section represents loss of vertices, but a negligible loss of length, up to the point where the line steepens. Theoretically this 'plateau' could also arise because there were no vertices that were within the tolerance corridor of the caricatured line i.e. because the tolerance may just have been too small to filter out any data points. However comparisons between experimental plots showed a significant reduction in vertex numbers for the corresponding tolerance ranges, which indicated that this was not the case. The phenomena occurs because excessive data points have been used to represent the boundary information. This may occur through the use of stream digitising or scanning in the data capture process (Barber et al, 1995, Herbert et al, 1992).
Figure 4. Illustration of the tolerance selection process for quality controlled simplification.
The challenge is to find the tolerance value at the point where the line steepens, i.e. where loss of line length begins. The equation for the straight line was determined by simplifying the dataset twice using different tolerances. Using this equation a tolerance could then be determined that represented the point on this line equivalent to total line length prior to simplification. Figure 4 illustrates diagrammatically how this was achieved.
Volume controlled generalisation is the alternative to a quality controlled approach. This approach considered the datasets collectively and examined how the variations in characteristics of individual dataset might affect the relationship of the reduction in numbers of data points with changing tolerance. As previously observed this relationship was determined by the coefficients a and b of the approximated function y = ax^{-b} . Different properties of each dataset where therefore correlated with the corresponding a and b coefficients for the dataset. A relationship was then sought amongst the datasets by using least squares regression. The a coefficient was found to correlate linearly to the total number of vertices. This is expected as, in the function y = ax^{-b} , when x is one y will be a, since one raised to any power will be equal to one. The coefficient b was found to correlate best with the property of the average number of vertices per line. This relationship indicated an exponential reduction in the rate at which points were lost as the average complexity of the line decreased. Figure 5 illustrates the parameterisation of coefficient b for the 17 datasets.
Figure 5. Relationship between the constant b and the average number of vertices per arc.
Analysis and parameterisation of the function coefficients meant that a prediction of the rate of change in vertices could be made based on the initial conditions of a dataset. This could then be used to predict the necessary tolerance to obtain a desired degree of data reduction. For instance, it was found by empirical observation that a tolerance that corresponded to a 50% reduction in data points was often sufficiently safe for simplifying most datasets. 'Safe' meaning that the selected tolerance fell within the range of low tolerances where vertex reduction was most rapid.
The third approach to tolerance selection, differed in that whilst it was derived from the analysis of reduction in numbers of vertices with tolerance, it gave more attention to the properties of the data set in order that the final tolerance selection was less subjective than in the second technique. The method aimed to define an optimum tolerance value for the profile. This is the point at which the change in percentage number of vertices was equal to the change in tolerance with respect to one another. This point is shown, for three of the test datasets in Figure 6 and can be described as the point on the curve that is closest to the origin for each of the three curves.
Figure 6. Illustrating the optimum point for file reduction.
This 'turning-point' marked the transition between the rapid reduction in number of points and the limit beyond which further simplification was likely to lead to loss of shape and self-intersection. Analysis was undertaken to derive this point for each of the test datasets and to then correlate it with an appropriate dataset property using least squares regression. The most appropriate property was found to be vertices per metre. Figure 7 shows the correlation between optimum tolerance and vertices per metre, for each of the 17 datasets.
Figure 7 The Relationship between measured optimum tolerance values and average number of vertices per metre.
Simplification using these methods of deriving a tolerance could not guarantee that in absolutely every instance there would be no cases of self-intersection. Yet, a robust working environment demanded that this be the case. There was a need, therefore, in the design of algorithm, for a technique for the detection and amelioration of conflicts of this nature in the event that self-intersection occurred as a result of the simplification process. Due to the wide variation in the properties of the polygons for any given dataset there could be no guarantee that any selected tolerance would be not produce self-intersecting boundaries. Therefore a method was designed to remove arcs associated with a conflict and replace them with their original, unsimplified, lines. Figure 8. describes this process.
Figure 8. Illustration of re-insertion method used to preserve topology.
The analysis provided three techniques with which to drive tolerance selection in the algorithm. In the initial versions of the algorithm only the technique controlled by length was applied since the vertices controlled algorithms could not prevent the problem of self-intersection and boundary intersection. The length controlled selection was unaffected by this problem as it did not alter the position of any of the boundary lines. However, it was limited in the amount of data that could be removed, which depended on the initial efficiency of information storage in the dataset. It also involved the processing overhead of two additional simplification routines making it slower. The subsequent development of a routine to overcome the problem of self-intersection allowed a robust algorithm using the other two techniques of data volume controlled simplification to be used. This final version of the algorithm used both these techniques in selecting the tolerance for the algorithm. The algorithm first calculated the tolerance required to achieve a 50 percent reduction in vertices. Then the optimum tolerance was calculated for the dataset. The lower of the two tolerances was then selected. This strategy set a ceiling of 50 percent reduction on the algorithm, to err on the safe side. The ceiling was set since in those examples where significantly more data volume reduction could be achieved, the rate of change in the gradient describing vertices loss with tolerance was initially very steep. Therefore the inaccuracies associated with using a regression model for prediction could be observed to have more extreme outcomes in terms of over-simplification. 50% was chosen since it was observed in all of these case to be within the region of the curve where data reduction was most rapid but sufficiently removed from the point of change so as to ensure that over-simplification did not occur.
The algorithm was tested comprehensively on the seventeen datasets. In every instance, and in further subsequent use, the algorithm guaranteed a result that was topologically correct and retained the precise number of polygons and IDs. The reduction in file size was as high as 80% for some datasets but more typically between 20% and 40%. Importantly the differences between datasets before and after simplification were not visually discernible. Clearly this approach is a very significant improvement over manual approaches since it achieves very significant reductions in file size, guarantees maintenance of shape, guarantees maintenance of topology and achieves this without any intervention, visual inspection or interaction with the user.
Maintenance of data quality was an important aim of the research. One of the aims of the implemented routine was therefore to provide details of changes to the data brought about by simplification. This was achieved by appending information about the simplified data set to a meta data file. The purpose of the meta data is to provide the user with an indication of the changes to the data thus helping the user to decide whether the end product is suited to their needs. Figure 9 shows an example of such a metadata file.
Figure 9. An example of the information appended to the meta-file by the simplification routine, in this case for Enumeration Districts in the Welsh County of Dyfed.
The final algorithm was thus a combination of a rule-base system for preserving polygons, separating the dataset into homogenous subsets and maintaining topological integrity, and a component for mathematically determining appropriate tolerances based an evaluation of the statistical properties of the sub-setted datasets. The approach satisfied its objectives by conserving the integrity of the datasets both in terms of topological consistency and linkage to third party datasets. This was ensured by the application of rules to protect small polygons from simplification and to replace intersecting or self-intersecting altered boundary lines with their original geometry. The objectives of maintaining data quality whilst maximising data reduction were met by exploiting the empirical observation that a significant volume of data points could be removed at a minimal cost to loss of detail. This observation was optimised with respect to tolerance. Minimising the loss of detail also ensured that the completeness of the dataset was as far as possible retained and thus the flexibility in the potential of end uses of the dataset was conserved. In addition quantitative measures recording the changes in the statistical properties of the dataset were supplied to the user as metadata. The objective of robustness was handled by the algorithm through a series of preliminary and post processing checks and ameliorations. This meant that in the event that simplification was inappropriate for the dataset, for instance where the data was stored initially at a maximum efficiency, the original dataset could be returned. This concurs with the requirements for efficiency and robustness. It should be stressed that the research focused on the automatic derivation of parameters for a line simplification algorithm without human intervention, together with evaluation techniques and the provision of metadata information. This autonomous simplification of datasets is illustrative of the way in which more fully automated environments can support a broader community of users whose domain of expertise may lie outside GIS and automated cartography.
The authors are very grateful for comments from reviewers and for funding of this project from the Economic and Social Research Council, through the award of the research grant H507255146, "Self Evaluating Generalisation Algorithms to Derive Multi Scale Boundary Sets". Gratitude is also extended to the EDLINE consortium for their support and that of GRO.
Bader, M. and R. Weibel. 1997. Detecting and resolving size and proximity conflicts in the generalisation of polygon maps. InProceedings of the 18th ICA/ACI International Cartographic Conference, vol. 3, pp. 1525-1532.
Baeijs, C., Y. Demazeau and L. Alvares. 1996. SIGMA: Application of Multi Agent Systems to Cartographic Generalisation. In7th European Workshop on Modelling Autonomous Agents in a Multi Agent World, MAA-MAW'96, vol. pp. 163-176.
Barber, C, Cromley R. G, and Andrle, R (1995). "Evaluating alternative line simplification stratergies for multiple representations of cartographic lines". Cartography and Geographic Information Systems 22(4): 276-290
Buttenfield, B. 1985. Treatment of the cartographic line. Cartographica, vol. 22, no. 2, pp. 1-26.
Buttenfield, B. P. 1991. A rule for describing line feature geometry . InMap generalisation: Making rules for knowledge representation, B. P. B. a. R. B. McMaster, (ed) Essex: Longman Scientific & Technical, pp. 150-171.
Denham C (1993) "Census Geography - an overview", in Dale, A & Marsh, C (eds) The 1991 Census User's Guide, London, HMSO
Douglas, D. H. and T. K. Peucker. 1973. Algorithms for the reduction of the number of points required to represent a digitised line or its caricature. Canadian Cartographer , vol. 10, no. pp. 112-122.
Herbert, G., E. Joao and D. Rhind (1992). "Use of an Artificial Intelligence Approach to Increase User Control of Automatic Line Generation". Proceedings of EGIS '92, Munich, EGIS Foundation, 1, 554-563.
Mackaness, W. A. 1994. An Algorithm for Conflict Identification and Feature Displacement in Automated Map Generalisation. Cartography and Geographic Information Systems, vol. 21, no. 4, pp. 219-232.
Mackaness, W. A. and G. Mackechnie. 1998. Detection and Simplification of Road Junctions in Automated Map Generalisation. GeoInformatica, vol. no. pp.
Medyckyj-Scott, D. and B. Morris. 1998. The virtual map library: providing access to Ordnance Survey digital map data via the WWW for the UK Higher Education Community. Computers, Environment and Urban Systems. (in press)
Medyckyj-Scott, D. and H. M. Hearnshaw. 1993. Human Factors in Geographic Information Systems. London: Belhaven Press.
Muller, J. C. 1991. Generalisation of Spatial Databases. InGeographical Information Systems, D. J. Maguire, M. Goodchild and D. Rhind, (ed) London: Longman Scientific, pp. 457-475.
Muller, J. C. 1991. Prospects for and Impediments Against a New Cartography in the 1990s . In Advances in Cartography, J. C. Muller, (ed) London: Elsevier Applied Science, pp. I-14.
Muller, J. C. and W. Zeshen. 1992. Area-patch generalisation: a competitive approach. Cartographic Journal, vol. 29, no. Dec, pp. 137-144.
Ormsby, D. and Mackaness. W. A. 1998. The Development of a Phenomenonological Approach to Cartographic Generalisation. Cartography and Geographical Information Systems, vol. Special Issue (in press), no. pp.
Painho, M. 1995. The Effects of Generalisation on Attribute Accuracy in Natural Resource Maps. InGIS and Generalisation: Methodology and Practice, J. C. Muller, J. P. Lagrange and R. Weibel, (ed) London: Taylor and Francis
Wang, Z and Muller, J-C 1998. Line Generalisation Based on Analysis of Shape Characteristics. Cartography and Geographic information Systems, Vol. 25, No. 1, pp. 3-15
Zhan, F. B. and B. P. Buttenfield. 1996. Multi Scale Representation of a Digital Line. Cartography and Geographic Information Systems, vol. 23, no. 4, pp. 206-228.