Department of Geography,
Australian National University,
Canberra, Australia, 0200.
This paper presents a method which helps analysts understand the classification derived by any feed forward Artificial Neural Network (ANN).
Due to their non-linearity, hidden layer(s) and large number of connections feedforward ANNs are seen as too complex for users to understand. How a result is derived is not easy to interpret from the weights matrix. There have been attempts to reduce the complexity of network construction by automating, as much as possible, the structure and training. However this still does not enable the user to visualise what the network is doing, and why it is doing it. Such complexity means ANNs are used as black box classifiers where information is fed in one end and a result appears at the other.
Using any GIS it is possible to replicate the network structure and visualise it as geographic entities rather than as symbolic nodes. By visualising network training using the geographic nature of the data it becomes possible to gain an understanding of how a result is calculated. This leads to increased confidence in the results, as well as enabling greater control of training by non-expert users.
This paper will discuss the visualisation in geographic space of ANN training and present an example with discussion, ending with concluding remarks.
This paper presents a method which helps analysts understand the classification derived by any feed forward network. It does this by visualising network training using the geographic nature of the data. This may be done using any GIS that supports map algebra and is capable of batch processing.
Artificial neural networks (ANNs) have been used in geographical analysis for over a decade. Applications include vegetation and land cover mapping (Fitzgerald & Lees, 1996; Foody et al, 1997; Foody, 1997), land degradation (Mann & Benwell, 1996), ore reserve estimation (Wu & Zhou, 1993), geological mapping (An et al, 1995) and classifying remote sensing data (Miller et al, 1995). The volume of research presented in remote sensing journals and conferences has increased exponentially from 1989 to 1995 (Wilkinson 1997a), demonstrating their perceived importance for analysing environmental data.
Despite the increasing popularity of ANNs as analysis tools for geocomputation there are still impediments to their use by the wider geographic research community. Due to their non-linearity, hidden layer(s) and large number of connections ANNs are seen as too complex for users to understand. How a result is derived is not easy to interpret from the weights matrix. Gahegan et al (1996) and German et al (1997) have attempted to reduce the complexity of network construction by automating, as much as possible, the structure and training. However this still does not enable the user to visualise what the network is doing, and why it is doing it. Such complexity means ANNs are used as black box classifiers where information is fed in one end and a result appears at the other (Openshaw & Openshaw, 1997; Gahegan et al, 1996). This black box characteristic is seen as reason to use other classifiers (Friedl & Brodley, 1996).
ANN behaviour is unpredictable because the training algorithms are not guaranteed to converge on a global solution, possibly ending training in a sub-optimal solution. Overfitting results from overtrained networks and reduces the generalisation ability of the network as a classifier. If the user can visualise training in some meaningful way then these problems may be reduced.
Geocomputational analyses deal with geographic objects, normally as layers in a GIS. Maps enable the identification of spatial patterns and associations and are the easiest means of conveying information about spatial objects. Viewing GIS layers as maps is the most common visualisation method. If the user can visualise ANN training as a series of maps there should be increased understanding of, and possibly confidence in, the results.
This paper will discuss the visualisation in geographic space of ANN training and present an example with discussion, ending with concluding remarks.
Feedforward ANNs are constructed using large numbers of simple processing elements, or nodes. These nodes normally are organised into three layers, the input, hidden and output layers. These layers are connected by weights. The input layer is where the input data is presented to the network, ie Landsat TM data. The output layer contains the predictions of the network. The hidden layer allows feed forward ANNs to model complex relationships.
The generation of individual nodes is a simple mathematical equation. Nodes in the hidden layer are the sum of the input nodes multiplied by their connection weight, plus a bias term. The output nodes are calculated the same way using the hidden node values and connection weights.
Using layman's terms a feed forward network operates in this way:
For geographic analyses the inputs are all derived from geographic datasets. Given this it is possible to duplicate the operation of a feed forward network using a GIS. To do this easily the GIS must support map algebra and batch scripting, for example Arc/Info or GRASS.
Generating geographic datasets from an ANN may be done in a few steps. The first is to extract the connection weights and biases from the network parameter file. Next the first hidden node dataset is generated using the input datasets and their weights connecting them to H1. For example using Arc/Info GRID the first hidden node would be calculated as
h1 = 1 / (1 + exp( -( aspect * -0.01020 + band2 * -5.53406 + band4 * 13.20565 + band7 * 1.06983 + elevation * -5.17787 + geology * 15.89054 + slope * 6.84227 + flowaccumulation * 0.66209 + -16.25757)))This process is repeated to create each hidden node dataset, and then to create the output node datasets. (The 1 / (1 + exp ( -(...))) is a sigmoidal transform function used to scale values into the interval [0,1].)
Once the datasets have been created the values of the ANN may be visualised using the geographic nature of the data.
Visualisation of an ANN is, in many implementations, restricted to the use of colour or shape symbology for the weights and node activations (see Figure 1). This is restrictive as only one input pattern activation may be viewed at one time. Geographic datasets will have hundreds to thousands of possible combinations, making it very difficult to understand the meaning of the connections in terms of the input datasets.
Figure 1: One normal method of visualising a network structure.
During training the ANN is attempting to minimise the classification error by adjusting the weights. It is possible to stop training at any time, save the weights, and continue training without interrupting the efficiency of the training process. Thus it is possible to take time slices of the network at intervals and create a visualisation of any network state using the geographic approach outlined above.
The time slices of the network state are best interpreted using animation. This is because attribute change is best observed when directly overlain by the next representation of that attribute. It is difficult to observe change in fine features when the next display is to one side. Such features might be visible geographic features such as rivers or ridges, or might be the connection weights. Animations provide the easiest means of observing this.
The example dataset is the same as that used by Lees & Ritman (1991), Fitzgerald & Lees (1996), Gahegan et al (1996) and German et al (1997). The dataset consists of a landsat TM scene (December, 1994), geology and a DEM interpolated from 10m elevation contours. The training dataset is generated from 1705 quadrats; vegetation classes and frequencies are listed in Table 1.
|3||Lower slope wet||52|
|4||Wet E. maculata||251|
|5||Dry E. maculata||181|
Table 1: Vegetation classes.
A feed forward ANN was created with 8 input, 10 hidden and 9 output nodes. The input nodes represent aspect, Landsat TM bands 2, 4 and 7, elevation, geology, slope and flow accumulation. The output nodes represent the vegetation classes listed in Table 1.
The network was trained using backpropagation with momentum. Learning rate and momentum were initially set high and reduced through the training process. 10 hidden nodes were used. The network was initialised with random weights and trained for 140 iterations when the mean squared error (MSE) had stabilised. No attempt was made to optimise training except to limit the number of ocean samples to 258. This was to reduce biasing training to that class.
Time slices of the network weights matrix were taken at an interval of 10 iterations, starting with iteration 0. For each time slice the hidden and output datasets were generated. Output images were constructed using the network structure and weights.
The final output of the network is shown in Figure 2. Network training is shown as an animation in Figure 3. For display the hidden nodes have been equal area stretched as individual datasets. Output nodes are displayed using a linear equal interval stretch, colours represent the same values in each output dataset. The weights are scaled using loge transformations and then rescaled to use all available colours. The transformations differ for the hidden and output connections. Actual values are not shown because it is the relative values that are important.
The output hard classification is shown in Figure 4, also as an animation. This was generated by taking the output node with the highest value at each point. Figure 4 is shown because it is useful to observe the output classification with the ANN structure.
Figure 3: Animation showing network training in geographic space.
Figure 4: Animation showing network output in geographic space.
Figure 2 shows some of the advantages of visualisation using geographic space. It can be seen that hidden node 2 is largely a combination of input nodes 2 and 6, band 2 and geology. It can also be seen which hidden nodes contribute to which output nodes, although this is more complex.
Figures 3 and 4 show some interesting features of the network training. After initialising with random weights the network oscillates with both the hidden and output nodes changing activation. By iteration 100 the hidden nodes appear to have stabilised, with only node 9 exhibiting any significant change. After iteration 100 it is the output nodes that are changing, finalising the convergence on a solution. This is visible by the changes in Figure 4 at these iterations. It should be noted that after iteration 100 the MSE was close to its final value.
Observing only the network weights in Figure 3 one does not see much change after iteration 70 for the input to hidden layer weights, and iteration 100 for the hidden to output layer weights. Despite this there is subtle change altering the response of the network to some input conditions and these are visible only through this geographic display.
The display methods used for Figures 2 and 3 are very simple. It would be worthwhile trying different colour enhancements to identify other features in the data. This is up to the individual user's preferences for display colours and enhancements.
One of the major benefits of visualising ANN training in geographic space is that a domain expert can see when a classification has gone 'wrong'. A geographic image provides more information than the normal single value error reporting common to most classifiers, such as root mean square error. More information is obtained by looking at the spatial associations of the classes. For example training may converge on a nonsensical solution where rainforest occurs in the ocean. Alternately the prediction may be biased to one class because it is over-represented in the training data. These may minimise the error but are obviously incorrect when observed by a domain expert. These are examples of convergence on a sub-optimal solution, or unpredictable behaviour. In such cases training may be restarted from the time of error, with some parameter adjustment, to converge on a more sensible solution. Overfitting is more difficult to observe but it may be possible to view when a class is too closely aligned with a particular set of input conditions.
Another value of this visualisation is when there are two or more correlated variables that contribute partially to a result. Viewing only the weights leads the user to see the most influential weights. Where the correlated variables each have a small or medium weight their combined weights may dominate the result because of the additive nature of ANNs. This is most easily seen using geographic visualisation.
Figure 3 suggests visualisation and interpretation of larger or more complex network structures in this way is difficult. However, this problem also applies to any analysis using large numbers of input datasets. To overcome this the user should visualise some of the available information at one time. This may be by zooming to a specific geographic extent to observe the interactions for that location, or to restrict the display to a few interesting nodes and connections. For example viewing only one output dataset would reduce the number of connection weights to be displayed, facilitating easier interpretation through visualisation.
Other problems may occur when the input datasets appear similar, for example multi-temporal classifications using multi-date satellite data, or spatial context analyses. In such cases visualising the weights will generate extra information that may lead to a better interpretation.
It should be remembered that any other form of data interpretation may be used with this method, for example error matrices and viewing the second highest predicted class or viewing the error matrix changing through the training process. It is also possible to use other spatial operators, but these are probably best applied to only the output nodes.
It is not possible at this time to view network training in geographic space in real time without the aid of a supercomputer, although this is more a matter of time than impossibility. There are also reasons why it is useful to take time slices to view after training. These are that network training may be viewed at a rate suitable for the user, speed and direction may be adjusted as required. Additionally, if the user wishes to focus on several areas of interest this is more easily done after training.
Real time visualisation of ANN training in geographic space is currently not feasible. This is not seen as a problem because it is more controllable when viewing after training has been completed. Restarting training is simple as it requires only the weights to be loaded, and these are stored each time a network snapshot is taken.
This method is not exclusive of other methods of ANN visualisation. Rather it provides another means of interpretation, one that is powerful because it is easier for people to interpret maps than other forms of data presentation. If the user has more meaningful understanding of network training then confidence in the results will increase.
With this method users can better understand how ANN training achieved a solution in terms of the input datasets. It does not entirely remove the black box tag but does enable some insight into the operation of an ANN. This makes them more transparent to users and should encourage more practitioners in the spatial sciences to make use of them as analysis tools.
The approach described here has been implemented in simple form by the author using Arc/Info AML and GRID and the Stuttgart Neural Network Simulator (SNNS) Code is available on request.
An, P., Chung, C.F. & Rencz A.N., 1995; Digital lithology mapping from airborne geophysical and remote sensing data in the Melville peninsula, northern Canada, using a neural network approach, Remote Sensing of Environment, 53(2), 76-84.
Fitzgerald, R.W. & Lees, B.G., 1996; Temporal context in floristic classification, Computers and Geosciences, 22 (9), 981-994.
Foody, G.M., 1997; Fully fuzzy supervised classification of land cover from remotely sensed imagery with an artificial neural network, Neural Computing & Applications, 5 (4), 238-247.
Foody, G.M., Lucas, R.M., Curran, P.J. & Honzak, M., 1997; Non-linear mixture modelling without end-members using an artificial neural network, International Journal of Remote Sensing, 18 (4), 937-953.
Friedl, M.A. & Brodley, C.E., 1997; Decision tree classification of land cover from remotely sensed data, Remote Sensing of Environment, 61 (3), 399-409.
Gahegan, M., German, G. & West, G., 1996; Automatic neural network configuration for the classification of complex geographic datasets, Proceedings of the First International Conference on Geocomputation, University of Leeds, UK, 343-358.
German, G., Gahegan, M. & West, G., 1997; Predictive assessment of neural network classifiers for applications in GIS, Proceedings of Geocomputation '97 & SIRC '97, University of Otago, New Zealand, 41-50.
Lees, B.G. & Ritman, K., 1991; A decision tree and rule induction approach to the integration of remotely sensed and GIS data in the mapping of vegetation in disturbed or hilly environments, Environmental Management, 15, 823-831.
Mann, S. & Benwell, G.L., 1996; The integration of ecological, neural and spatial modelling for monitoring and prediction for semi-arid landscapes, Computers and Geosciences, 22 (9), 1003-1012.
Miller, D.M., Kaminsky, E.J. & Rana S., 1995; Neural network classification of remote-sensing data, Computers & Geosciences, 21(3), 377-386.
Openshaw, S. & Openshaw, C., 1997; Artificial Intelligence in Geography, Wiley, Chichester, 329pp.
Wilkinson, G.G., 1997a; Neuro-computing for Earth observation - recent developments and future challenges, pp 289-305 in Fischer, M.M. & Getis, A., 1997; Recent Developments in Spatial Analysis: Spatial Statistics, Behavioural Modelling and Computational Intelligence, Springer, Berlin, 429pp.
Wilkinson, G.G., 1997b; Open questions in neuro-computing for earth observation, pp 3-13 in Kanellopoulos, I., Wilkinson, G.G., Roli, F. & Austin, J. (eds), Neuro-computation in remote sensing data analysis, Springer-verlag, Berlin, 284pp.
Wu, X.P. & Zhou, Y.X., 1993; Reserve estimation using neural network techniques, Computers & Geosciences, 19(4), 567-575.