Xin Chang Zhang
City and Resource Planning Department, Zhongshan University, 135 Xingang Xi Road, Guangzhou, 510275 P.R.China.
Edge-matching is one of the most useful procedures in processing spatial data. But several frequently used methods have showed their shortcomings. They are blind to error data included in the original data, and most of them are sensitive to a threshold. An effective matching procedure should correct most of the errors included in the data, or should detect these errors which may not be easily corrected. side from this, all lines are matched successfully. Geometric feature-based edge-matching procedure is one such method which is applied to match contour line data from photogrammetric model. Because every type of geographical element has innate geometric features represented by the geometric data (coordinates). These data processed should demonstrate their geometric features. By comparing the geometric features of the data with the innate ones, we know which links are the best ones. So the geometric features of elements may be regarded as credible data in the process. procedure matching contour line with the test of its features can be converted to a search procedure. It is depth-optimised. The procedure looks for a correct path on a reversed "tree" and its nodes are the contour segments. Correct path gives out the sequence of linking these segments.
Edge-matching is one of the most useful techniques in the process of spatial data. A lot of data must be organised in a proper form to meet different needs. This technique is very often applied for building topological data, matching photogrammetric models, and combining several large scale maps into a small scale map, which are main contributors to the functions of GIS. There are several methods to implement edge-matching procedure for different situations. Node-matching method is usually used to build topological relation of Node/Edge; Zipping method is commonly used for matching lines between adjacent map sheets and matching photogrammetric model. The matching results are generally good if the original data included no error data. But if error data are included in the original data, this procedure will give unsatisfying results in which new errors may be added. In that case, it will take a lot of interactive work to correct these data (both the new and the old). However, geometric feature-based method is especially good at detecting errors.
Edge-matching is the process to determine which edges (lines) should be linked among candidates. For some cases, one edge will join with only other one and for some other cases, more than two edges will be linked together. It depends on the features (attributes) represented by data. Process of contour data is the former case and process of river or boundary is the later.
Edge-matching is a necessary process in organisation of data. It recovers the original states of the data. In capturing data, the same coordinate points may be digitized differently. One long line may be divided into several pieces. Besides, some errors (such as redundant lines, missing lines......) will inevitably happen in the course of data capturing.
These data mentioned above must be improved in this matching procedure so that these data are available to GIS. But ,edge-matching is an uneasy process which has been addressed by several authors. Although it is very simple for a human operator to do it because of human being's innate ability of recognition, it is particularly difficult to implement this procedure by a computer (Raymond J. Hintz and Mike Z. Zhao 1990). The difficulty is that there is no satisfying matching rules to be obeyed when the original data include some error data. From the point of applications, however, matching work should mostly be done by a computer. That is to say, we would choose a proper method to instruct a computer to work harder than an operator to do, rather than a simple method to let an operator instead of a computer to do extra jobs.
The matching results by a usual method (Raymond J. Hintz and Mike Z. Zhao 1990; Schen K.T.1986) largely depend on the data to be processed. The same procedure may be available to some sets of data. The usual methods are generally grouped as distance-based and feature-based method.
As known to us, here are no theoretical data to be referred in the matching procedure. We must find some matching rules (criterion) within the original data for determining matching lines. In common sense, distance between lines is the most possible criteria to be recommended. In most cases, the distance between two edges is simply defined as the minimum of distances between end points of different edges as figure1. Edges l1 and l2 have correspondingly end points n11,n12,and n21,n22. The distance D takes the following function:
D(l1,l2)=min{d(n11,n21),d(n11,n22),d(n12,n21),d(n12,n22)}.
This method is also order-related. Figure2 shows that case. For
a given threshold e, if n1 is regarded as a referred point, n1
will only join with n2 because of d(n1,n2)<=e and d(n1,n3)>e.
But if n2 is regarded as a referred point, (n1,n2,n3) will take
the same point which may be the original case. Besides, the above
defined distance may immediately show its disadvantages when these
edges to be processed have overlapped parts illustrated in figure3. Lines
l1 and l2 have a rather long overlapped part. It is obvious that
l1 and l2 are composed of one original line. Above method can
only give right result when a threshold is assigned a rather large
value which may cause a lot of confusing results.
The method can be improved by redefining the distance between
edges as following :
Nevertheless, this method is still sensitive to a threshold which
will be determined by no means of ease. Another disadvantage of this method is that it is blind to error
data included in the original data. These error data may interfere
the procedure to produce good results.
In a lot of cases, the attributes represented by these data is more important than their geometry. A road should be linked with a road, a river should be linked with a river. In matching contour from photogrammetric models, contour lines of the same height can only be linked. Thus, matching criteria in this method is that lines of the same attribute are likely to be linked no matter how far the distance might be. In fact ,this method is often mixed with the above distance-based method. First, a small number of candidate lines can be determined by the distances.
Zipping method is another very useful method in matching adjacent map sheets as well in matching adjacent photogrammetric models. It works like a zipper. The procedure begins with a standard line. Two side must have the same number of lines to be matched. The line on one side will be joined with the line on other side. But this procedure will get confused if the number of lines on one side different from that on other side. Any procedure will be misled if it processes a set of data with wrong attributes. However, in most actual situations, the original data do not include attribute errors but do include geometric errors, which is a reasonable prerequisite. The error data mentioned in this paper is referred to geometric errors.
It is known to us that any set of data to be processed are likely to include inevitable some error data or unexpected data. Matching procedure should have the procedure to process not only data without errors but also that with some errors. In other words, this procedure will detect existing errors.
It is very clear that no theoretical data to be referred in matching procedure. However, every geographical element (such as river, road, contour line and so on) has its innate geometric feature which can be demonstrated by the coordinates of the elements. So the geometric features trends to be applied to this procedure (relation between elements may also be included in the features). There may be a lot of geometric features for a type of element, some of which may not easily be implemented into a computer, and some of which may not be available the matching procedure. Choice of features for matching procedure depends on the type of element, namely on the data to be processed. This matching procedure determines linking edges by comparing the innate geometric features with ones shown by data. If the features shown by the data do not conflict with innate ones, these edges will be linked; otherwise they are not linking edges. Some of which are regarded as errors if they remain to conflict with innate features in matching.
The algorithms are going through in following steps. Supposing S is a set of edge candidates to be matched; STK is a stack for linking edges; TEP is a working buffer and SUP is a buffer for storage of suspicious edges. Then as following:
STEP 1: If S is empty, the procedure ends; STEP 2: To extract first edge from S and to push it into STK, to copy all edges from TEP; STEP 3: To test STK for its geometric features if the features do not conflict with the innate ones, turn to STEP 6.Otherwise turn to STEP 5; STEP 4: To test STK for terminating condition. If STK meets terminating condition, turn to STEP 8.Otherwise turn to STEP 6; STEP 5: To pop STK. If STK is empty, to resort S and free TEP, turn to STEP 1; STEP 6: If TEP is empty, suspicious edges are in STK, to copy STK to SUP, to empty STK and turn to STEP 8; STEP 7: To extract a edge from TEP, to push the edge into STK and turn to STEP 3; STEP 8: To extract these edges the same as in STK from S, to turn to STEP 1.
Choice of candidate edges for S are related to the processed data. These edges of the same attribute or in certain region can compose S. Usually, S will contain more candidates than the required. To test STK means to test all linked edges in STK. Terminating condition for STK depends on the type of data. Different geographical elements have different terminating condition. For example, closure is a terminating condition for a contour line. Testing is the most important step for this process, and it may take much computer time to finish the process. However, this testing will give out high percent of correct results, in which very few jobs is left to be interacted. STEP 6 can detect some error data or unexpected data and store them to SUP, in which some may be corrected or removed within the matching procedure or outside the procedure.
An example will give clearer description of the application of the method. In this case, we use the method to match contour lines from several photogrammetric models covering a map sheet. Figure 4 illustrates usual cases in overlapping region. Some contour lines are separated, in contrast, some contain large overlapping parts, some entangled and some are extra lines. Before going the procedure, we first look at the geometric features of contour lines. Contour lines on a map are geographical elements which have special geometric features. We choose two of them for the matching procedure. The two features are very useful and keys to the process. They are as following:
It is clear that these data to be processed can not demonstrate
the two geometric features. If we process the data by the two
features, we will get reasonable results. The first feature is
an ideal terminating condition for a contour line. That is, if
a contour line which might consist of several original lines is
closed or terminated at the margin of a sheet, this line is a
completed one and needn't to be further matched. The second feature
is a linking criteria. If a linked line intersects with some other
line, this line will be re-matched in the following processes.
Because the elevation of contours are included in the data, the
set S will consist of these contour lines of the same elevation.
For example, the candidate set S has five lines of the same elevation
which actually belong to a contour line in originality illustrated
in figure5. Matching procedure will determine the sequence for
linking lines. Let c1 be the starting line. We don't know that
c1 should join with which one among other four lines before test
of linking criteria. This matching procedure can be covered a
search procedure. The procedure looks for a right path (solution
node) in a reversed "tree". Only one branch is the required
one as thick arcs in figure 6.
If S contains a large number of elements (lines), it will take long time to go through all possible path to obtain the best linking sequence. Because, test of linking criteria, in which every point is going to be processed, is a time-consuming procedure. However, this test may be replaced by a simplified process which may reach the same results according to these contour data. A line may possibly join with any one of others. We would select the most possible one among the others before test of the link rather than determine the best link after test of all possible links. The search procedure is described as following:
Supposing H to be the set of linking lines after test, c<k> to
be current line on the layer K in the "tree" and S to
be a candidate set (Figure 7) .
Basing on c<k>, the most possible line in S will be chosen for the test .this line is on the layer k+1 under c<k>. After c<k+1> is removed from S, we test this link with linking criteria. If the link does not conflict with its geometric feature, c<k+1> is added to H and become the current line instead of c<k>. Otherwise, another most possible line in S is chosen. If S is empty, we will remove c<k>from H and return to the layer k-1. The line c<k-1> in H, which was once linked with c<k>, is the current line. If H is empty, c<k> is suspicious line and the lines in S will be modified for starting a new matching. The procedure will not end unless terminating condition for H is met.
In matching contour lines, a line can link two other lines at its two end parts. Thus, the search procedure will be double directed. That is, two lines are chosen for a given line. A new line is formed after test of linking criteria. Then ,basing on the longer line, another two lines will be found to be linked with this line. According to common sense, a line is more likely to be linked with nearer line than with a further line. The distance between two lines can give out the best linking possibilities. The less the distance is, the larger the linking possibility is. The distance is here defined as minimum of ones between all points of two lines, which is still take much of computing time. But, we will simplify this computation by computing the distance between end parts of two lines according to the contour data from photogrammetric models
An available assumption will answer how many points of a line
are used to compute the distances. This assumption is that an
overlapping part of a line is proportion to the length of whole
line and is probably not to excess the square root of the whole
length. That is, (n the number of points
of a line ). This assumption makes this matching procedure quite
simple both in testing linking criteria and in linking lines.
For example, if two lines have 10 points, two parts of each line
will be computed. So the ratio of actual computations to the original
ones is
. If points are increased to 100, the
ratio will be
. This example shows the advantage
of using this simplified computation when contour lines have a
large number of points.
Geometric feature-based method is reliable because geometric features of elements is constant. This method overcome the shortcoming of no referred data in matching procedure. The results depend on these geometric features defined in the matching procedure. The procedure determine linking edges by comparison of computed features from data with innate ones. The more features the procedure has, the better results it gives out. But, it will take very long time to finish the test process when the procedure contains too many features. Thus, we should choose the most important features for the procedure to avoid spending a lot of unnecessary computing time. Also in the course of testing features, we can use some simplified methods to reach almost the same results.
Raymond J. Hintz and Mike Z. Zhao.(1990). 'Demonstration of Ideals in Fully Automatic line Matching of Overlapping Map Data', Auto-Carto 9 Proceedings. p.118.
Schenk T.(1986)'A Robust Solution to the Line-Matching Problem in Photogrammetry and Cartography', Photogrammetric Engineering and Remote Sensing. Vol.52 No.11 p.1779-1784.
Greenfeld,J.S. and Schenk, A. F.(1989).'Experiments with Edge-based Stereo Matching', Photogrammetric Engineering and Remote Sensing.. Vol. LV No.12.
Li Denren and Chen Xiaoyong.(1990).'Automatic Rater-Vector Converting From CCD Scanned Topographic Contour lines', International Archives of Photogrammetry and Remote Sensing.VoI.28-3/2,Wuhan.
Heleva, U. V.(1988).' Object-Space Least-Squares Correction', PE&RS. VoI.54,No.6.
Wu Hehai.(1991).'Cartographic Data Base System', Surveying and Mapping Press.
William A. Mackaness and Peter F. Fisher.(1987).'Automatic Recognition and Resolution of. Spatial Conflicts in Cartographic Symbolisation',AUTO-CART8 .p198-214.
Mackaness, W. A. and M. K. Beard.(1993).'Use of graph theory to support map generalization', Cartography and Geographic Information system.20(4): 210-215.
Li, Z. and B. Su.(1995).'Algebraic models for feature displacement in the generalization of digital map data using morphological techniques', Cartographica.32(3): 39-56.
Stan Aronoff. (1993).'Edge Matching', GIS: A Management Perspective. p 203-204.
Michael F. Goodchild and Gary J. Honter.(1997).'A simple positional accuracy measure for linear features' International Journal of Geographical Information Science Vol.11,No.3,299-306.
Environmental Systems Research Institute, Inc.(1995).'Managing
the database', Understanding GIS. Lesson 7 . 6-54.