On the Automated Generalization of Road Network Maps

Marc van Kreveld
Dept. of Computer Science, Utrecht University,
P.O.Box 80.089, 3508 TB Utrecht, The Netherlands.
Email: marc@cs.uu.nl

Jarno Peschier
Compass Interactive BV,
Eekholt 54, 1112 XH Diemen, The Netherlands.
Email: postbus@jarno.demon.nl

1. Introduction

Automated cartographic generalization has been an important research area for years now. Besides many papers, three books are devoted exclusively to automated generalization [2],[8],[9]. Several frameworks have been proposed [1],[11], and several of the operators that actually perform generalization have been descibed, especially for line simplification.

The most basic of the generalization operators is selection: which map objects should be selected for display on a particular map at a particular scale [13]? This question has been addressed for point features, where it is called settlement selection [5],[14]. In the case of subdivisions, the dissolution operator can be used for elimination (the inverse of selection), for instance with the GAP-tree [15].

The selection of line features has been considered in the context of road network maps and drainage networks. We are aware of two papers that select subsets of the roads to perform generalization [6],[12]. Both consider whether algorithms in graph theory can be used. For example, a road network map connecting a set of cities can be generalized by only selecting the roads in the minimum spanning tree. This idea guarantees the topological property of connectedness, but there are serious limitations to the graph approach. Firstly, there is no guarantee that avoiding coalescence and imperceptibility--two of the objectives of generalization--is realized. These geometric aspects that are not present in a graph. Secondly, semantic aspects of road network generalization such as avoiding large detours are not taken into account explicitly.

Another paper [7] discusses the detection and collapse of groups of junctions to a single junction. This is another important aspect of road network generalization and complementary to road selection.

In this paper we address road network generalization by selection more rigorously. We will first define what the input to road network generalization should look like. Then we list the basic geometric, topological, and semantic requirements [16] of the output, the generalized road network. Such a list is necessary to design an approach that performs the task at hand well. Based on the list, we outline a three-step approach to road network generalization. Most of the ideas in the paper have been implemented and applied to a data set.

2. Specification of the input

A road network is modeled as an embedded graph consisting of nodes and arcs. Some of the nodes are intersections, others serve to describe the shape of the road and are basically control points. A road network may have crossing arcs due to bridges and tunnels, implying that the graph is not necessarily planar. In a realistic model of a road network, roads have an importance. Highways are the most important roads and small dirtroads are unimportant. Any road network generalization algorithm must take into account that important roads should be given preference over unimportant roads. Therefore, the importance must be part of the input specification. We model the input road network as a graph G with node set N and arc set A, and with the following information: The information listed above contains graph structure like incidence between nodes and arcs, coordinates to obtain embedding, and semantic information like importance.

From the input, two important concepts can be introduced. They are sets whose elements depend on the currently selected network.

Before applying generalization, the whole input network is selected. When the algorithm to be described operates, parts of the input network may no longer be selected and the sets defined above will change. An intersection of degree 3 may cease to be an intersection if one of the incident roads is no longer selected. Therefore, the set Q is defined in terms of the selected road network. The same holds true for the set of roads, which may merge with adjacent roads.

3. List of requirements of the output

It is essential in the design of a generalization function that attention be given first to the requirements. These can be metric, like avoiding coalescence, or topological, like maintaining connectivity, or semantic, like keeping short routes between important places. Keeping the objective of road network maps and the reasons for generalization in mind, we have come to the following requirements:
  1. Avoid coalescence: no two roads may be closer than some distance d. Here d is a tunable real number that depends on the scale of the generalized map. E.g., d can be set so that no two roads are closer than half a millimeter on the generalized map.
  2. Avoid detours: for any two important nodes p1,p2 in P, the length of the shortest path between them in the generalized network should be at most c times the length of the shortest path in the input network. Again c is a tunable parameter that must be set greater than 1.
  3. Give preference to important roads: whenever possible, important roads should be given preference for inclusion over less important roads.
The first requirement is geometric and the other two are semantic. The topological requirement that the generalized network be connected is implied by the second requirement: no connection implies an infinite detour.

The list above contains one hard requirement and two `soft' requirements. The first requirement will always be guaranteed by the algorithm, no matter what input road network. Given this decision, we cannot guarantee the second requirement, so it is soft. It may be the case that only way to avoid a long detour is to include a road that is very close to another road. The other choice would be to avoid long detours always, but then one has to accept that coalescence may occur. We have chosen to avoid coalescence over avoiding detours. The third requirement is soft in the sense that in the output network, some less important roads may be chosen while some important roads are not. This is desirable as we noted before, since less important roads may be relatively important in the neighborhood.

4. Three-step outline of the approach

We say that two roads are in conflict if their minimum distance is less than d. This statement must be interpreted appropriately. Two roads that start (end) at the same intersection always have distance 0, but we do not want the algorithm to detect a conflict (coalescence) in this case. We redefine the smallest distance between two roads as the smallest distance while excluding both parts of the road closer than d from the two ends. In the case of crossing roads (bridge or tunnel), we also exclude parts of the roads closer than d from the crossing point.

Figure 1: Circles around ends of roads, and distances between roads.

The algorithm to detect if two roads are in conflict is used often in whole generalization procedure. Therefore, it is important that this detection be done very efficiently. We treat this test first before giving the whole generalization procedure.

Suppose that two roads r1 and r2 have m1 and m2 nodes defining them, respectively. A brute force test for minimum distance would take O(m1·m2) time. By employing techniques from computational geometry [4],[10] one can improve this considerably. We compute the Voronoi diagram of all nodes and arcs [17] from the roads r1 and r2. A Voronoi region of a feature from r1 adjacent to a Voronoi region of a feature from r2 gives a pair of features that may give the minimum distance between r1 and r2. By testing all pairs of neighboring Voronoi regions and their features we can find the overall minimum.

Figure 2: Voronoi diagram of the nodes and arcs of two roads.

Such an algorithm to find the minimum distance takes only O((m1+m2)log(m1+m2)) time; it takes O((m1+m2)log(m1+m2)) time to compute the Voronoi diagram and it takes O(m1+m2) time to test adjacent regions. One can also use a heuristic to be even faster in most cases. Compute the bounding box B1 of r1, the bounding box B2 of r2, and test if these are at least distance d apart. Such a test requires only O(m1+m2) time in total, or only O(1) time if the bounding boxes have already been computed. But it sometimes fails to decide whether roads are too close, and we need a more time consuming algorithm in such cases.

The method we have designed for road network generalization has three stages. In the first, an initial selection of all roads is made based exclusively on the importance of the roads and the total number of conflicts. The second step is to test all pairs of roads for conflicts, and delete the least important of the two. The third step consists of adding roads, possibly of lesser importance, to avoid long detours. Care must be taken that no new conflicts are introduced here. We discuss the three stages in more detail.

First step: initial selection.

We assume that there is a fixed number of different importance values, for example, the integers 1 up to 5, where 5 is most important. In the first step we'll compute an integer i so that if all roads of importance i and up are selected, then there are not too many conflicts. For each possible importance value and implied selection, we determine how many roads are in conflict with some other road. We choose the smallest value i such that the percentage of roads in conflict is less than, say, 30%. This implies that in the second step we need only eliminate less than 30% of the roads to obtain a conflict-free road network. Probably only fewer roads need be eliminated, because when two roads are in conflict, we need to remove only one.

When testing whether a road r is in conflict with some other road, we start with the roads that bound the same face as r. As soon as we find a conflict we can stop the test for r and mark it as being in conflict. Because the test for conflicts excludes the parts of roads near the ends, it may be the case that r has no conflicts with roads bounding the same face, but has conflicts with roads that bound other faces. By cleverly choosing the sequence by which to test roads we can usually decide after a few conflict tests whether r has conflicts or not. A logical sequence is a breadth-first sequence starting from the road r. By using bounding boxes, it is possible to decide when it is safe to conclude that r is not in conflict.

An upper bound on the running time of the first step is O(knlogn), where k is the number of roads and n is the total number of nodes of the network. One can expect to do the first step much faster with the suggestions for efficiency just given.

Second step: eliminate conflicts.

The second step of the algorithm resolves all conflicts by eliminating one of each pair of roads in conflict. It is natural, and in accordance to the third requirement, to remove the road with smaller importance. In case of the same importance we can remove either road. When a road is removed, a junction can become a node of degree 2, which can imply that two roads are adjoined to one. One has the choice to make which importance to assign to the next road, if their importances were different.

Figure 3: When the vertical road is eliminated, the two other roads become one.

Elimination of conflicts should be done incrementally. When a conflict between two roads is resolved by elimination of one road, other conflicts with the eliminated road are resolved too, of course. But new conflicts may arise when two roads are adjoined, since the former junction, now normal vertex, no longer has a conflict-free region around it. This is the reason that we cannot precompute all conflicting pairs and simply remove one road of each.

We'll treat the roads in order of increasing importance. If the road under consideration is in conflict with some road of the same or greater importance, we eliminate it. If this makes a new, merged road to appear, we have to treat it too, even if both subroads have already been tested. Again, bounding box tests and a suitable order by which to test other roads will help considerably for efficiency.

The worst case running time of the second step is O(knlogn), and again we expect to do much better in practice.

Third step: avoid detours.

The third step of the generalization method is to add roads needed to avoid detours, but without creating new conflicts. This involves the roads eliminated in the first step, some of which may not have any conflict, but also roads eliminated in the second step. Such a road may have been eliminated because of a conflict with a road that itself was eliminated later in the second step. We choose to be conservative with adding new roads. They are added only if they really help to avoid a detour and don't have a conflict. One could also choose to always add roads that are not in conflict.

Note that detours are defined in terms of shortest paths in the input network, and detours are avoided only for the set P of important nodes. So we start by computing all shortest paths between pairs of nodes in Q in the input network. Using Dijkstra's algorithm [3] this takes O(mnlogn) time in total, where m is the number of important points and n is the total number of nodes and arcs in the network.

Most difficult in this step is how to generate roads that can be tested for inclusion in the road network. We do this as follows. For each important node p we compute its shortest path in the currently selected network to all other important nodes. If each path is at most c times the length of the corresponding shortest path in the input network, we don't have any detours from p and we are happy with the connections from p. If there is a serious detour to another important node p', we'll try to add roads to create a shorter connection. We take the shortest path between p and p' in the input network and consider which parts are not present in the currently selected network. We test each of these for conflicts with the selected roads, and add them if there are no conflicts. If we achieved to add all parts, the shortest path between p and p' is present in the selected network. If only some parts could be added, we may still have reduced the length of the shortest path between p and p' and possibly, avoided the detour.

For the analysis, we consider the running time for each important node that is tested. We compute the lengths of the shortest paths to every other important node in O(nlogn) time with Dijkstra's algorithm. For each of the at most m important nodes to which a serious detour exists, we may need to consider determining and adding at most k roads, where k is the total number of roads in the input network. For these roads we must test if they are not in conflict with any other road. This test takes O(nlogn) time for each road piece, leading to a worst-case running time of O(km2nlogn). But one can expect a running time somewhere between O(mnlogn) time and O(m2nlogn) time typically, if heuristics with bounding boxes and an appropriate order to test for conflicts are used.

If the method above does not avoid a detour sufficiently, it is certainly possible to generate other short paths and try to insert these. As noted before, however, it is not possible to obtain a guarantee on the detour as long as one insists on avoiding coalescence.

5. Short roads

The main problem with the method outlined above is how it deals with short roads. Any road that appears on the map with length less than 2d, for instance, will never be in conflict since it lies completely in the distance d-regions of its two ends. This implies that clusters of short roads may appear in the generalized network. We will discuss three possible solutions to the problem.

Solution 1: redefine conflict.

Short roads themselves are not really a problem, only a collection of many short roads close together is. This implies that the induced subdivision has many small faces. A similar observation can be made when we consider imperceptibility. A short is not imperceptible because it is part of the larger network, but a very small face in the subdivision will be imperceptible.

We can redefine conflicts for short roads as follows. Two roads are in conflict if both have length at most 2d and they are incident to some small face with area at most, say, 2d2.

The method described above will resolve the conflict by removing one of the conflicting short roads. This causes the small face to be merged with another face.

Solution 2: eliminate short roads at first.

In the first step of the algorithm, a preselection of the roads was made based solely on the importance. If fewer roads are preselected, then fewer short roads will appear in the generalized network, because roads are added only in the third step. Here the addition is only done if it helps to avoid a detour. Large clusters of many short roads aren't useful to avoid detours, one needs only a representative selection of connections. These will be established in the third step.

To achieve that the preselection is smaller, can adapt the first step by computing the smallest value i such that at least some percentage of roads are not in conflict, and at least some percentage of the roads is not too short. This may cause i to be greater than before (described in the first step). The motivation for this solution is that a cluster of short roads generally contains many that are less important.

Solution 3: use junction simplification first.

Perhaps the best solution to avoiding clusters of short roads is to apply a separate processing step where these clusters are collapsed to single junctions. The detection and collapse of clusters has been studied by Mackechnie and Mackaness [7], and our methods can be combined with theirs. Detection and collapse of clusters can be done beforehand, after which our generalization method can take over. It is also possible that our method is done first without giving special attention to short roads, as it is described in the previous section. Then the clusters of junctions can be detected and replaced by single junctions afterwards. A drawback of this third option is that the junction collapse methods don't deal with coalescence and detours explicitly.

6. Concluding remarks

This paper presented a new approach to decide which roads should be selected when generalizing a road network. The approach was designed while keeping three main objectives in mind: not allowing roads to be too close, avoiding detours between important points, and giving priority to bigger roads. These objectives led to a three-step approach that performs generalization. Several options and alternatives were suggested to tune or vary upon the approach.

The ideas outlined in this paper have been implemented partly and can be run via the World Wide Web. The Java applet can be found at http://www.jarno.demon.nl/thesis.htm. The implementation is a prototype that hasn't been optimized in running time. Therefore it is rather slow. Not all solutions suggested in this paper are included. It does allow tuning of several parameters, however, and gives an indication of possible outputs of the generalization method.

Figure 4: Data set used by the Java applet.

In this study of road network generalization, we have only chosen the selection operator as means of generalization. There are several other operators that can be useful: line simplification, line smoothing, merge lines, collapse a road (and merge the incident intersections), and local displacement of lines. This is a subject of further research. Another issue is a more extended study of dealing with short roads, especially the combination of junction collapse with our method.


The authors thank Robert Weibel and Frank Brazile for helpful comments. This research is partially supported by the ESPRIT IV LTR Project No. 21957 (CGAL).


 [1] K.E. Brassel and R. Weibel. A review and conceptual framework of automated map generalization. Int. J. of GIS, 2:229-244, 1988.

 [2] B.P. Buttenfield and R.B. McMaster, editors. Map Generalization: Making Rules for Knowledge Representation. Longman, London, 1991.

 [3] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, MA, 1990.

 [4] Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, Berlin, 1997. http://www.cs.ruu.nl/geobook/

 [5] G.E. Langran and T.K. Poiker. Integration of name selection and name placement. In Proc. 2nd Int. Symp. on Spatial Data Handling, pages 50-64, 1986.

 [6] W.A. Mackaness and M.K. Beard. Use of graph theory to support map generalization. Cartography and Geographic Information Systems, 20:210-221, 1993.

 [7] G.A. Mackechnie and W. Mackaness. Detection and simplification of road junctions in automated map generalization. In ACSM/ASPRS Annual Convention & Exposition, Technical Papers Volume 1: Surveying & Cartography, pages 72-82, Bethesda, MD, 1997. ASPRS/ACSM.

 [8] R.B. McMaster and K. Stuart Shea. Generalization in Digital Cartography. AAG, Washington, D.C., 1992.

 [9] J.-C. Müller, J.-P. Lagrange, and R. Weibel, editors. GIS and Generalization - Methodology and Practice, volume I of GISDATA. Taylor & Francis, London, 1995.

 [10] F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, New York, NY, 1985.

 [11] K.S. Shea and R.B. McMaster. Cartographic generalization in a digital environment: when and how to generalize. In Proc. Auto-Carto 9, pages 56-67, 1989.

 [12] R.C. Thomson and D.E. Richardson. A graph theory approach to road network generalization. In Proc. 17th Int. Cartographic Conference, pages 1871-1880, 1995.

 [13] F. Töpfer and W. Pillewizer. The principles of selection. The Cartographic Journal, 3:10-16, 1966.

 [14] M. van Kreveld, R. van Oostrum, and J. Snoeyink. Efficient settlement selection for interactive display. In Proc. Auto-Carto 13: ACSM/ASPRS Annual Convention Technical Papers, pages 287-296, 1997. http://www.cs.ruu.nl/people/rene/settlement/index.html

 [15] P. van Oosterom. The GAP-tree, an approach to `on-the-fly' map generalization of an area partitioning. In J.-C. Müller, J.-P. Lagrange, and R. Weibel, editors, GIS and Generalization - Methodology and Practice, number 1 in GISDATA. Taylor & Francis, London, 1995.

 [16] R. Weibel. A typology of constraints to line simplification. In M.J. Kraak and M. Molenaar, editors, Proc. 7th Int. Symp. on Spatial Data Handling, pages 9A.1-9A.14, 1996.

 [17] C. K. Yap. An O(n logn) algorithm for the Voronoi diagram of a set of simple curve segments. Discrete Comput. Geom., 2:365-393, 1987.