Parallel Distributed Processing for Digital Terrain Analysis

Philip J. Rallings1, J. Andrew Ware1 & David B. Kidner2

1Division of Mathematics & Computing, University of Glamorgan, Pontypridd, Rhondda Cynon Taff WALES CF37 1DL.

2School of Computer Studies, University of Glamorgan, Pontypridd, Rhondda Cynon Taff WALES CF37 1DL.

Email: pjrallin@glam.ac.uk



1. Introduction

This paper discusses the issues involved when implementing visibility analysis on a distributed cluster of networked machines. The practicality of doing so is demonstrated by parallelizing a line-of-sight algorithm for determining the visibility indices of entities (or objects) such as elevation vertices, buildings, or road centrelines on a digital terrain model (DTM). This may be a requirement of site selection for a contentious development, especially if visibility, or more specifically, visual intrusion is likely to be a key factor in gaining planning approval. As vast quantities of spatial data become available, particularly DTMs at larger scales and denser resolution, the demands for parallel processing will inevitably increase. Past research has centred on specialised parallel processing hardware (i.e. the Transputer - Kidner et al., 1997), however a model is presented here for parallelizing algorithms on a multipurpose parallel platform of multi-vendor machines.

One of the main application areas of parallel processing in GIS has been in the field of digital terrain modelling, including terrain characterisation, feature extraction, DTM generation together with triangulation, and visibility analysis (Kidner et al., 1997). Other examples of parallel algorithms in the GIS domain, include map overlay and intersection, line simplification, cartographic name placement, and location optimisation.

Many of today's GIS have now integrated digital terrain modelling functionality with traditional GIS functions to increase the range of applications now possible. One such application is site selection, where it may sometimes be necessary to characterise the terrain as part of this process. For example, in wind farm planning (Kidner, 1997, 1998), the site selection algorithm must identify locations which have good exposure; are not too steep; have a predominantly westerly aspect (in the UK); and ideally have low visibility, particularly from urban areas. As an attribute of terrain position, visibility can be represented as a visibility index. The visibility index of a point is defined as the number of occurrences a particular entity is seen (i.e. within an unobstructed line-of-sight or LOS) from that point. Entities might include buildings, postcode centroids, road centreline vertices, or most commonly, other surveyed heights within the terrain model. This inevitably means that uni-processor systems provide poor response times when calculating the visibility indices of more than just a few locations, as is required for site selection. Typically, a possible site may be a polygon which incorporates hundreds or thousands of DEM vertices, each of which could be characterised with a visibility index calculated to hundreds of thousands of other DEM vertices or lines-of-sight. In these circumstances parallel processing techniques can be used to enhance the benefits delivered by the GIS.

The increasing availability of computer networks, combined with recent advances in modern PC operating systems, means that many organisations already have an existing multi-purpose parallel processing resource which could be utilised to carry out such tasks. In this paper, the authors present a new approach for parallelizing the visibility index operation on a cluster of Pentium based workstations. Results for an inter-visibility study of the South Wales valleys are presented, with particular reference to an application for determining the most suitable locations for siting wind turbine generators (WTGs) within a pre-defined area.


2. Visibility Analysis

For visibility analysis, the general problem is to identify the locations on the surface (or viewshed) which can be seen from an observer location. The most common approach to representing the terrain surface is as a regular grid digital elevation model or DEM. A DEM records the elevation of the surface at fixed intervals of x and y, such as 10 meters for Ordance Survey (O.S.) 1 : 10,000 scale data, 50 meters for O.S. 1: 50,000 scale data, or 30 meters for United States Geological Survey (U.S.G.S.) 1: 24,000 scale quadrangles. The advantages of using such a data structure for representing the terrain is that the uniform grid minimises search operations, since the co-ordinates of an elevation are implicitly related to the matrix position. Algorithms are also conceptually very simple when applied to the regular grid.

A profile of the terrain from the object to each DEM vertex (or observer position) is constructed and is classified as being "line-of-sight" or not . A line-of-sight profile simply implies that the object is visible from the observed location. The profile of the terrain can be reconstructed using a number of different algorithms and a variety of interpolation techniques (Kidner & Dorey, 1998 Weibel & Kidner). The most popular approach is to estimate the profile elevation using a linear interpolation algorithm, usually between the two vertices of an intersected grid cell. As each elevation is interpolated, the angle subtended to it from the observer is checked against the angle subtended to the object from the observer. If the profile point's angle is greater than the object's, it is classified as not being line-of-sight (figure 1).

Figure 1 - Determination of whether a profile is line-of-sight (LOS) or not.

 

Each vertex of the DEM is classified in this manner to determine the binary viewshed areas from which the object is visible or not visible. When there is more than one object to consider, a cumulative viewshed will be calculated which determines the number of objects which can be seen from each vertex of the DEM. Cumulative viewsheds are often represented by maps which band together the visible regions.

It is apparent that the processing workload can become very considerable as more objects are included within the cumulative viewshed calculation. The actual workload will be dependent upon the geographical extent of the area under consideration and the resolution of the terrain database. It is common for an Environmental Impact Statement of a proposed development to consider the viewshed over an area extending up to 20km from the site.

However, up until now we have only considered the case where we have pre-defined co-ordinates for the landscape objects, such as wind turbine generators (WTGs). The problem can be further extended to one of site selection which will require additional information. For example we may use our GIS to determine the optimal site for a wind farm according to a large number of criteria (Kidner, 1998), but we will have often have additional local criteria for optimising performance with respect to individual wind turbine placement within the site. These criteria may include maximising the resource potential by modelling the wind flow over the terrain; locational constrains such as WTGs must be at least 250 metres apart and of course visibility constrains, such that visual intrusion is minimised.

In the site selection process, the locations which are chosen for calculating the visibility index are not predetermined. The simplest scenario is to calculate the visibility index for each DEM elevation, or alternatively at fixed intervals at the sub-cell level within the site. In practise, however, typical visibility index values would be quoted in terms of tens of thousands of DEM vertices, and our indices calculated for hundreds or thousands of possible locations. To illustrate the work load involved, if the visibility indices for each vertex in a 20 x 20 kilometer DEM are calculated as the cumulative line-of-sight to every other point in the DEM, the necessary processing time required would be 299 days at a resolution of 50 meters (1:50,000) or 508 years at a resolution of 10 meters (1:10,000). None of these estimated timings take into account the larger number of interpolations required at the 10 metre resolution, so these estimates of one millisecond per profile would also increase.

This illustrates the need for a parallel algorithm to address this type of application. Whilst parallel processing will make some inroads into the development of a more efficient approach to calculating DEM visibility indices, the problem may also have to be simplified in order to make this type of analysis viable for inclusion within the broader confines of a large scale site selection GIS. This could simply include reducing the number of observation and / or target points, or even using a multi-scale approach which begins at a coarse resolution and is refined at higher resolutions where there appear to be sharp changes in visibility.

The solution to obtaining increased hardware performance is theoretically quite simple - either design a faster central processing unit or use more of them (Strand 1992). Despite the perpetual upgrading of computer technology and design, the demands of some of these GIS applications have grown at an increased rate. In these instances, we need to consider the alternative option of using multiple processors, either within the same hardware platform or across a distributed network.


3. Parallel Architectures

Modern parallel processing architectures have evolved into two broad categories, based upon the architectural differences; these are shared and distributed memory systems.

Shared memory parallel systems link multi processors on a common memory and system bus. Whilst processing is distributed amongst the available processors, each processor still shares core services, such as memory, disk and operating system resources / functions. Communication between processors is achieved vi shared or common memory between one or more processing elements. Shared memory systems include small to large-scale parallelism such as a dual or quad processing systems (Pentium or SPARC) and the Connection Machine. Modern systems of this type implement Symmetric Multi-Processing (SMP) and used a thread based model for parallel processing. SMP machines generally incorporates multi processors with a single unit / machine.

Distributed parallel processing system includes a much broader spectrum of machines and devices and is open to a wider scope of interpretation. Distributed parallel systems involve linking a set processing elements that have a certain level of autonomy with respect to memory and other system resources. Processing elements and connected to other processing elements via dedicated point to point links or a common communication bus. Parallel systems of this type include a transputer parallel sub-system where a network of transputers (each with their own memory) are linked to a host system which provides some shared resources, i.e. User Input / Output, disk / file access and operating system functions. Inter-processes communication is achieved using a message passing model where information / data is sent between communicating processors via the inter-process links or communications bus.

However, distributed parallel systems has adopted a much broader interpretation especially with the increase of computer / local area networks (LANs). In this sense a distributed system can include single processor machine connected via a common interface / LAN to provide a parallel processing cluster, a resource that can be exploited as a virtual machine. Each individual machine in a cluster is an independent system in its own right, with total control over system resources including memory, disk etc.). The adoption of distributed or cluster computing has brought parallel processing to a wider audience so that benefit can be achieved without the need of expensive high performance specialised parallel processing equipment, this is especially true in the case of GIS.


4. Parallel Workstation Cluster

Distributed computing is well establish in the business and commercial sector, mainly due to the introduction of networks, operating systems and distributed databases. The scene is thus set for the benefit of distributed computing to be applied to GIS, especially in the area of digital terrain analysis and other process intensive applications.

The main advantage of using a cluster is that many organisations already have a network of fast processor machines (e.g. Pentium PC-based workstation). A network of uni-processor machines can be exploited as a non-specialised distributed parallel processing resource and significant speed-up can be achieved with only a few machines. A parallel cluster of machines (or workstations) can provide an overall system that is powerful, scaleable and well structured (Bell, 1991). One of the main reasons for this is the newer faster generation of processors (especially Pentium II), and it is the speed of these modern processors (figure 2) that makes a parallel cluster attractive for use in GIS.

 

Figure 2 - Processor Generation Relative Speed & Performance

A parallel cluster can connect machines of varying speeds, architecture and capabilities and can include specialised hardware. Clusters are generally classified into two categories, homogenous or heterogeneous. Homogenous networks connect machines of the same architecture and are therefore much simpler to support and organise in terms of data and application programs. Heterogeneous networks, on the other hand, link machines of different architecture (vendors) which introduces new issues when designing a distributed system, the main factor being a coherent and understandable protocols for process creation and communication.

A number of software packages are available for implementing a distributed system across a wide range of machine (Dongarra et al., 1993). These include PVM, P4, Express and Linda. The Message Passing Interface (MPI) is well established and governs inter process communication procedures in a message passing environment. To be of practical use, the software must support a number of different operating systems, architectures and machine manufacturers. Having a common system helps in the process of software development and design.


5. Parallel GIS Issues

A distributed cluster of workstations provide a powerful yet flexible parallel processing system, however several issues must be address when implementing a distributed GIS system especially when implemented on a heterogeneous network.

When designing a distributed system the issues are computational models; task management, data management and system communication. The degree to which each of these areas affects a GIS system is determined by the degree and complexity of both the network cluster (including the communication network) and the computation model adopted.

The computational model chosen to solve a particular problem determines the nature and manner in which work is distributed across the processors in the network. This includes the way the problem is parallised using either algorithmic or domain decomposition, and extends even further to the stragady for work allocation. Algorithmic decomposition exploits the parallelism inherent in the algorithm. Domain decomposition exploits the fact that the algorithm can be applied to different parts of the problem domain in parallel. An efficient parallel implementation must maximise the proportion of time the processors spend on performing necessary computation. An imbalance will result in processors standing idle while others struggle to complete the allocated work, thus limiting potential performance. Load balancing is therefore a major issue in a distributed GIS and techniques must be introduced to provide an even division of computational work (algorithmic or domain) to all processors. In doing so, two approached can be identified, the data and demand driven models of work allocation or decomposition. The data driven scheme corresponds to a predetermined (static) work division model, while a demand driven system allocates work dynamically as computation proceeds. A hybrid scheme can also be implemented, utilising the appropriate technique as the problem demands.

Task management involves the creation and organisation of all processing tasks / processes in the system. The aim of task management is to ensure that all processing nodes are busy while the overall problem remains to be solved. The task management strategy is closely linked to the computational model, a demand driven model requires explicit load balancing in which all work assignments are allocated to processes before computation proceeds. A demand driven model requires explicit load balancing, thus an extra overhead is introduced to ensure that the processing nodes never run out of work until the entire problem has been solved. Task management involves task definition; task control and allocation; task distribution and collation of processed results and data. Task management is further extenuated on heterogeneous networks where task creation is different on each machine support different operating system calls for process creation and allocation of work to different speed processors must be taken into account.

Data management involves delivering the required data to the processors as and when required and is of great importance in a GIS application. At the simplest level, if all processors can hold all the data in memory, the data management involves sending all the data (DTM) to all processing nodes before computation starts, all nodes have access to all the data at any time. The latter is the ideal situation, however problems should not be restricted to those that fit completely within every processors local memory. The data can now be distributed across the entire system, or even secondary storage devices. For this class of application some form of data management is a priority to ensure that data items are available as the processing continues. Virtual shared memory regards the whole problem domain or data as a single unit in which all data items may be individually referenced. This is precisely how the data could be treated in a shared memory system, the domain data is distributed across the entire system and hence the term virtual. The data management strategy may include routing processes to necessitate the dissemination of data to individual processing nodes.

The performance of a disributed system is affected by the communication system and the level of inter-process communication. Two of the important components of a message transfer system is the processor interconnection network and the communication protocols. In a distributed system using a message passing communication structure, communication overheads inhibit overall system performance. The communication system plays a crucial role in reducing implementation penalties and in improving the scalability of a parallel solution. This is especially true when applied to GIS applications which involve the communication of a large data set to some or all of the processors. Special attention would be required when designing a distributed GIS system so that the communication overhead is kept at a minimum.

In a GIS system the data management and system communication strategies are closely related, especially when implemented on a cluster of networked machines. Operating systems offer file sharing capabilities across a network (or communication system), therefore data (and files) can be shared between machines in such a way so that each processor can read or have access to a shared file containing all the data. This reduces the need to communicate data between processes and for sophisticated data management strategies. In a homogeneous network a minimum of one data file is required and shared by all processes, inter-process communication can be at the hardware level since all machines have the same hardware architecture. However, in a heterogeneous network combining machines with different low level characteristics, a minimum of one file per architecture type is requires, and any inter process communication would have to be understood by all participating machines. In the latter case, an external data format would be required so that information is transmitted in an architecture independent form, an example is the use of external data representation (XDR).


6. Parallel Cluster Implementation

The cluster was established using twenty-four 100Mhz Pentium based machines connected via a standard 10 MBS ethernet network. The workstations run the Microsoft NT 4.0 Workstation operating system with remote shell login services. The distributed GIS application was developed using the PVM (Parallel Virtual Machine) system and the C programming language. PVM can be compiled on over 40 platforms including specialised and non-specialised parallel processing hardware, including Java PVM for distributed processing over the internet. PVM is ideal for code generation, since applications are portable over the PVM system, regardless of machine architecture. The PVM system provides routine for managing a distributed system including process creation and communication using the message passing model and therefore dictates, to an extent, the role of task management and system communication. System communication is via a common stand speed ethernet network with which all machine compete for transmission time. In developing a distributed system, attention must be pain to either reducing or synchronising inter-process communication and thus circumvent any communication bottlenecks.

Task management is accomplished by the PVM routines that imposed no overall control on process creation or communication. All machines in the system have the ability to create sub-processes with which to communicate. In addition it is possible for all processes to communicate with each other, although this may not be desirable from a design or control point of view.

Figure 3 - Single Master Node / Process Model

In this model (figure 3) the master process creates and communicates with all other processing nodes, including sending initial configuration parameters, distributing the computation workload and collating results. While this model is simple for process control it does have an implicit drawback in that all processes communicate with one master node and hence a potential communication bottleneck. However, as the number of processing nodes increases extra communication (sub-master nodes) can be introduced to share the communication burden. When and how these extra nodes are added depends upon the number of processing nodes and the data management strategy.

The simplest form of data management is for all processing nodes to have access to all of the DTM. In the system under consideration, it is possible for all machines to hold a copy of a considerably large DTM, however this may not always be possible, in this case a hierarchical structure or virtual shared memory system is required. For the visibility analysis all processing nodes receive all the DTM via a shared file, extra dada management features include sending the configuration parameters to all processes and collating the returned results. This is a simple but effective form of management that imposes little implementation penalties to the overall system. Visibility Index Algorithm

The parallel processing computation model is closely related to the application / algorithm and not the over all distributed system or software. Parallel processing involves either splitting the process into several sub-processes and performing them on different processors concurrently (algorithmic decomposition), or splitting the data that is to be processed between a number of processors and executing multiple copies of the process simultaneously (domain decomposition).

In the case of the visibility algorithm, the number of distinct tasks is relatively few (figure 4). The processing overheads are due to the sheer quantity of possible LOS profiles to be analysed. Calculation of the profile is relatively straightforward, depending on the algorithm used and the type of interpolation employed (e.g. linear, biquadratic, bicubic, etc.). For the purpose of calculating visibility indices, linear or bilinear interpolation is generally considered sufficient. The algorithm of Figure 2 assumes that the visibility indices are to be calculated for all DEM vertices within a defined visibility region (polygon) to all other DEM vertices within a zone of visual influence (ZVI).

Figure 4 - Algorithm for Determining the Visibility Index of a DEM.

The visibility algorithm therefore dictates that a domain decomposition strategy should be employed to segment the data such that each processor is given an equivalent amount of work, thus minimising idleness and producing a load-balanced system. In this respect, the equivalent amount of work would be measured in time rather than volume, with quicker processors being assigned a greater amount of work. The assignment process would be influenced by some measure of machine performance when compared with others or a base machine, this would enable faster resources to be exploited whist minimising any implementation penalty.


7. Domain Decomposition Strategies

The parallelising of the line-of-sight function is clearly one of domain decomposition, which can be achieved in two scenarios; applying all of the vertices of the visibility region (for which visibility indices are to be calculated) to part(s) of the DEM, and applying subset(s) of the visibility region to all of the DEM.

Figure 5 - Domain Decomposition Strategies

The domain can be decomposed using either a data or demand driven computation model.

The data driven model allocates the entire domain to specific processing elements before computation commences. Each processor thus knows in advance a static sub-set of the domain upon which the algorithm is to be applied. Data driven models can be balanced or unbalanced in dividing the domain between available processors. In balanced data driven systems (also known as geometric decomposition) an equal area of the domain is allocated to each processing element. As no further allocation takes place aster the initial distribution, a balance work load is only achieved if the computation effort associated with each domain item is identical. If not, some processing elements will have finished their portions whilst others have not. Thus to ensure a balance workload, the balance model should only be used if the computational workload required by each data item in the domain is the same. The balanced data driven model is the simplest to implement but generally does not produce efficient load balancing. The unbalanced data driven model allocates the domain to processors based upon the computational requirements of the domain data, rather than simply apportioning an equal number of tasks to each processor, the domain is allocated to ensure that each processor will complete its portion at approximately the same time. However, whilst this method will produce significantly improved load balancing, it requires a priori knowledge of the domain and the processing requirements of each data item.

In a demand driven model, work is allocated to processors dynamically as they become idle. Work can be allocated as single data items or in smaller blocks / area of the overall domain. The demand driven model does, however, introduce an extra communication overhead as processors return results from one assignment and request another. A major factor in this overhead is the granularity or size of workload assigned to each processor. If the granularity to small, then a great deal of inter-process communication is generated, however if the granularity to larger, then the division of work simulates those of the data driven and any benefits would be lost. In practise there is a little leeway in assigning the granularity of work, however there will be an optimal granularity of work assignments for a given domain / application. The demand driven model facilitates dynamic load balancing when there is no a priori knowledge as to the complexity if the different part of the problem domain. An initial investigation into both data (static) and demand (dynamic) driven allocation algorithm is already available (Rallings et al, 1998). Current research centres on two data and two demand driven algorithms. The demand driven approaches were tested using different levels of granularity in an attempt to ascertain the optimum size of work distribution. Full results have been processed for 1 to 24 processors, the different algorithms are compared with respect to overall speed up and processor efficiency.



Data Driven Decomposition

LINE - divides the DTM into complete rows (or lines), where each processor is allocated an equal amount of lines. Processing is spread out over the entire DTM and should achieve a better load balancing.



BLOCKS - divides the DTM into a number of smaller equal sized blocks of rows, where each processor is allocated an equal number of blocks. Processing is centralised on a few smaller blocks distributed throughout the whole of the DTM.



Demand Driven Decomposition

NPOINT - farms the DTM to processors n points at a time, where n is the granularity and is a parameter to the system.




FARML - farms the visibility region vertices points n points at a time, where n is the granularity and is a parameter to the system. A granularity of n = 1 is the same as FARMP.





8. Parallel Implementation Comparison Indicators

Some measure of parallel performance is requires to evaluate the success of any parallel implementation, for this purpose two comparison indices were observed for each work dividing algorithm; speed-up and processor efficiency. Speed-up relates to the time taken to compute a task on a uni-processor system to the time taken to compute the same problem using the parallel implementation. Speed up is thus defined as

The relative efficiency, based on the performance of the problem for one processor, can be a useful measure as to what percentage of a processor time is used on computation. Any implementation penalty would be immediately reflected by a decrease in the relative efficiency.

In terms of comparison, the relative efficiency indicator is the major factor since it has an implied time element in terms of implementation overheads and shorter execution. The efficiency factor also gives an indication as to the effective load balancing of the decomposition algorithm.


9. Testing and Results

The parallel visibility algorithms were applied to the problem of determining the visibility indices of a 2km x 2km visibility region of a larger 22km x 22km DEM of South Wales (the Zone of Visual Inference ZVI). The region is the location of the Taff Ely Wind Farm, 15km north-west of Cardiff, and is the subject of an on-going investigation into the performance of GIS tools for the visibility analysis of Wind Farms (Kidner, 1997; Dorey et al., 1998). The DEM was sampled at a resolution of 50m, hence the indices were calculated for 1,681 points (41 x 41) as the number of line-of-sight profiles to every other DEM vertex (19,481 point, i.e. 441 x 441 vertices). This is equivalent to nearly 327 million profile calculations; each of which could extend up to 1,000 interpolations. Figure 6 illustrates the results of this analysis, and the location of the actual WTGs on the wind farm site. The results ate tabulated according to the parallel computation model adopted with an overall comparison of best performing decomposition algorithm.

Figure 6 - Visibility Indices for a 2 km x 2 km Subset of the Taff Ely Wind Farm in South Wales.

(Index is defined as 1% (White) to 40% (Black) of the 441 x 441 DEM Vertices).


9.1 Data Driven Results

The results of applying the data driven or static decomposition are outlined in figures 7 and 8. Whilst all the data driven algorithms have produced the same level of speed-up to the number of processors, the efficiency is quite erratic, especially after 10 processors. This emphasises the difficulty of achieving a load-balanced system in a balanced data driven model. The peaks and troughs of the algorithm efficiency are set to continue (if not increase) as the extra processors are added. When comparing the results the LINE algorithm achieved better overall performance.

Figure 7 - Data Driven Speed-Up

Figure 8 - Data Driven Efficiency


9.2 Demand Driven Results

The demand driven algorithms were tested using different levels of granularity, FARML was tested using values of 5.10,20,30,40,50,60 and 70 whilst NPOINT was tested using a granularity of 100,200,300,400,500 and 600. The results are displayed in figures 9 - 12.

Figure 9 - FARML Demand Driven Speed-Up

Figure 10 - FARML Demand Driven Efficiency


Figure 11 - NPOINT Demand Driven Speed-Up

Figure 12 - NPOINT Demand Driven Efficiency

The FARML allocation routine provided a range of both speed-up and efficiency values depending on the granularity of the work assignment. Most notable is the divergence in speed-up after 10 processors and the extremely erratic efficiency levels for the larger granularity setting. The best performing granularity was 5 and 10 points at a time, this represents 0.29 % and 0.59% of the total domain respectively. The latter achieved the best speed-up and maintained stability with regard to efficiency as the number of processors increases.

The NPOINT algorithm produced less diverging results, although the range of granularity is smaller. The range was 100 - 600 points which is 0.062 % - 0.372% of the domain, as opposed to 0.29 % - 4.16 % in FARML. Although the different granularity produced roughly the same speed-up productivity, the 100 and 200 values produced a better efficiency rating.


9.3 Demand Driven Communication Overhead

The communication overhead imposed by the demand driven algorithms is shown in figure 13 and demonstrates the effectiveness of the implementations. However, the overhead increases as more processors are added, so at some point it will become inefficient, although at present the results are promising even on 24 processors. The communication overhead (Annaratone, 1989) is measured as a percentage of the overall parallel implementation time (not the uni-processor time) and is a necessary burden to ensure an effective load balanced system.

Figure 13 - Demand Drievn Communication Overhead


10. Conclusion

The results have proved that load balancing is difficult with a balanced data driven decomposition. Demand driven strategies, on the other hand, produced better results and far more resilient in relation to the processor network and load balancing. The main reason for this is that the domain must be divided according to equal amounts of processing time, if the latter is the same for all parts of the domain then simple geometric division of the domain into equal areas is sufficient. However, due to terrain specific features, the time requires to processes a point varies considerably throughout all of the DTM. The application of balanced data driven algorithms is therefore unsuitable for a distributed GIS, and the case for adopting a demand driven approach is convincing.

The algorithms have been tested on a network of homogenous processors of the same speed, but the problems of implementing data driven algorithms would be even more pronounced when heterogeneous processors of varying speeds are added. The problems would increase depending on the varying degrees of processor / machine capabilities. An unbalanced demand driven approach would produce significantly improved results, as the domain would be divided into equal areas according to processing time. However, it is difficult to achieve such a priori knowledge unless the DTM is first processes with a view to ascertaining the processing requirements (time) of each data point. This could be achieved in a parallel environment; the result of this first pass would be used to divide the domain into exact equal amounts of processing time for each processor. It is envisage that this would produce result rivalling that of the demand driven models. However, in a heterogeneous environment the domain would have to be dividing according to both processing time and processor speed / capability.

Demand driven implementations do not suffer from the same penalties, although attention is needed to keep the communication overhead to a minimum. However, it may be the case that in heterogenous network the granuaility may prove to be a determining factor than in homogenous networks, where their is a liberal freedom in terms of granuality. In a hetrogenous network the granuality would probably be determined by the speed of the slowest processor. At present only a standard level of granuality has been applied, it is feasible to use variable granuality where the granulity is high at the start of execution and decreases with the remaining workload. This would have the effect of decreasing the communication overhead during the life of the system (especially in the early stages) and provide more efficeint load balancing as workloads are allocated in smaller. The start level and reduction rate in this variable granulaity environment is currently being investigated.

In addition it is currently assumed that all processors have access to the entire DTM. Whilst this is feasible with the DTM under consideration, this may not always be the case. However the algorithms presented in this paper are still valid as a hierarchical system can be produced with additional control nodes. The multi-layered model (Dandamundi, 1990) would allow the DTM to be divided between additional control nodes such that each sub-system would reflect the overall system in that it would apply the work dividing algorithms to its own local domain, a sub-domain of the overall DTM. The latter design (figure 13) would be capable of processing much larger DTM models, at even greater resolution and still present a viable, well structured distributed GIS system.

Figure 13 - Multi Layered Model with Sub-Division of DTM

The additional control nodes would certainly be beneficial to the demand driven algorithms, even if each processor continued to have access to the entire DTM. The current research suggests that whilst one master node is sufficient for 24 processors the results indicate that the management communication overhead is set to spiral as extra processors are added. The same is not true of the data driven algorithms which involve little communication between the control and processing nodes / machines, however with data driven algorithms the emphasis is on designing a well balanced algorithm to exploit the full potential of the processing network.

Parallel processing of GIS problems is still in its infancy, despite the steady stream of academic conference papers which have addressed the issues over the last 10 to 15 years. The need for specialised hardware was certainly a major obstacle to its acceptance by a wider community outside of academia. However, as most organisation now use networked machines, parallel processing has the potential to benefit a much wider audience. The majority of applications which need parallelizing, such as visibility analysis, are conceptually very simple. The real problem is identifying the best strategy for assigning the workloads to machines which minimises both processor redundancy and communication overheads. This paper has shown that this visibility problem can be solved 22 times faster with 24 processors, or in general, by in excess of 90% of the number of processors assigned, thus reinforcing the scalability of distributed parallel processing systems.


References

Annaratone, M., Pommerell, C., and Ruhl, R. (1989) Interprocessor communication speed and performance in distributed-memory parallel processors. In 16th Annual Symposium on Computer Architectures, pp. 315-324, June 1989.

Bell, G "Scalable parallel computers: alternatives, issues and challenges (1991), International Journal of Parallel Programming,22,1,3-46 1991

Dandamundi S.P. and Eager, D.L. (1990) Hierarchical interconnection networks for multicomputer systems. IEEE transactions on Computers, 29(6) pp. 786-797, June 1990.

Dorey, M.I., Sparkes, A.J., Kidner, D.B., Jones, C.B. & Ware, J.M. (1998) "Calculating the Line-of-Sight on a DTM: Quantifying the Effect of DTM Scale and Topographic Features", Submitted Abstract, GISRUK'98, Edinburgh.

Dongarra, J.J, Geist, G.A, Manchek, R. and Sunderam, V.S. "Integrated PVM Framework Supports Heterogeneous Computing", Technical Report, Oak Eidge National Laboratory, 1993.

Kidner, D.B. (1997) "The Role of GIS in Wind Farm Planning", in Proc. of the 3rd Joint European Conference on Geographical Information, Vienna, April 16-18, pp. 843-852.

Kidner, D.B. (1998) "Geographical Information Systems in Wind Farm Planning", in Chapter 9 of Geographical Information and Planning:European Perspectives, (Editors: S.Geertman, S.Openshaw, & J. Stillwell), Springer- Verlag.

Kidner, D.B. and Dorey, M.(1995) "Visual Landscape Assesment of Wind Farms Using a Geographival Information Sytsem, Proc. of the 17th Annual Bristish Wind Energy Association, Warwick, July 17-19th, MEP Press, pp. 183-189

Kidner, D.B. and Dorey, M (1997) "What's the Point? Interpolation Algorithms Within a Regular Grid DEM." Computer Studies Technical Report, University of Glamorgan.

Kidner, D.B., Rallings, P.J. and Ware, J.A. (1997) "Parallel Processing for Terrain Analysis in GIS: Visibility as a Case Study", GeoInformatica, 1(2), pp. 183-207.

Rallings, P.J, Ware, J.A. and Kidner, D.B. (1998) "Parallel Distributed Processing for Terrain Visibility Analysis", GISRUK 31st march - 2nd April 1998

Ware, J. A. and Kidner, D.B. (1991) "Parallel Implementation of the Delaunay Triangulation Using A Transputer Environment", Proc. of the 2nd European Conference on Geographical Information Systems (EGIS'91), Brusells, Belgium, April 2-5, Vol. 2, pp. 1199-1208.

Weibel, R. and Kidner, D.B. Digital Terrain Modelling in GIS, Oxford: Oxford University Press. (in press)