Robert J. Abrahart
Department of Geography, University College Cork, Ireland.
Linda See and Pauline E. Kneale
School of Geography, University of Leeds, Leeds, LS2 9JT.
River flow prediction and forecasting are important environmental functions. The successful application of detailed physical-mathematical models offers one possible source for the provision of these estimates. But such models are often too complex, or too demanding in terms of data and computer requirements, for practical implementation purposes. Simpler approaches offered through 'conceptual' and 'black-box' modelling are thus attractive alternatives. Foremost in this re-emergent field is the use of computational intelligence tools such as neural networks and genetic algorithms - which are being investigated as potential mechanisms for the provision of detailed hydrological estimates.
However, irrespective of recent computational and methodological advances, several fundamental problems still need to be addressed - such as the selection of an optimal neural network architecture for each given task. A number of simple and novel solutions to this problem have been put forward in the guise of built-in functions and add-on software tools. These computational resources can be used to diminish the amount of subjective guesswork that is needed to resolve difficult network design issues. It is therefore important that scientists begin to examine the various options that are now available and in particular the extent to which the application of such devices can be used to assist the hydrological modelling effort.
This paper provides some numerical results from an initial investigation into the use of automated neural network design tools for the creation of improved network architectures based on a 'one-step-ahead prediction' of continuous flow records for the Upper River Wye catchment 1984-6. Four alternative neural network modelling strategies were implemented; the first investigation involved using standard procedures to create a set of standard networks; in the next two investigations two simple pruning algorithms were used to create a set of more efficient architectures, and in the last investigation a genetic algorithm package was used to breed a set of optimised neural network modelling solutions based on random mutation and survival of the fittest.
Neural networks have been applied to various hydrological modelling tasks and the application of these multifarious technologies will form an expanding area of scientific investigation throughout the next decade. Basic feedforward backpropagation networks and simplistic 'one-step-ahead river flow predictions' form the bulk of this work. However, in application based research, hydrological science has now begun to witness the adoption of alternative neural network strategies. For example, Self Organizing Map (SOM) data classification techniques (Kohonen, 1995) have been used to split the data into different subsets, which will in turn facilitate more accurate simulation through integrated multi-network modelling (Abrahart & See, 1998). More recent work has also involved the adoption of neural network solutions as embedded functions contained within Third Generation Language (3GL) programs that operate input-output feedback loops (Abrahart, 1998). However, irrespective of these recent computational and methodological advances, fundamental problems still need to be addressed - such as the selection of an optimal neural network architecture. Recent software advances in the guise of built-in functions and add-on tools can now be used to diminish the amount of subjective guesswork that is needed to resolve difficult network design issues. It is therefore important that scientists begin to examine these tools and the extent to which the application of such devices can be used to assist the hydrological modelling effort.
This paper provides some numerical results from an initial investigation into the use of various automated neural network design tools for the creation of improved network architectures based on a 'one-step-ahead prediction' of continuous flow records for the Upper River Wye catchment 1984-6. Four alternative neural network modelling strategies were implemented; the first investigation involved using standard procedures to create a set of standard networks; in the next two investigations two simple pruning algorithms were used to create a set of more efficient architectures, and in the last investigation a genetic algorithm package was used to breed a set of optimised neural network modelling solutions based on random mutation and survival of the fittest.
In general terms these collective experiments were designed to investigate the raw power, modelling possibilities and application potential that is associated with the use of these computer-based algorithms to:
Neural networks offer an important alternative to the traditional
methods of analysis and modelling. For example, in conventional
computing, a model is expressed as a series of equations which
are then translated into 3GL code and run on a computer. But a
neural network is much more flexible. Instead of being told the
precise nature of a relationship or model - the neural network
is trained to best represent the relationships and processes that
are implicit, albeit invisible, within the data. Neural networks
could thus be used to provide a robust error-tolerant multi-dimensional
non-linear solution in certain situations that would otherwise
present the hydrological modeller with a difficult modelling task.
However, in common with regression analysis, the fact that a relationship
between input (independent variables) and output (dependent variables)
can be modelled with a neural network provides no direct proof
that a connection or causal relationship exists. There could indeed
be no sensible or logical link between them. So, in all cases,
the final model and its internal relationships will demand a theoretical
justification that is made on logical grounds; a point that is
of particular importance with regard to the use of automated neural
network design tools of the kind that are being investigated and
reported on here. For readers who require additional information
on this subject a more detailed introduction to artificial neural
networks can be found in Openshaw & Openshaw (1997).
Neural networks are seen to offer a plethora of good hydrological modelling opportunities and various successes have been reported in the literature e.g. rainfall forecasting in space and time (French et al.,1992); predicting river flow levels at ungauged sites (Karunanithi et al.,1994); spatial interpolation of aquifer properties (Rizzo & Dougherty,1994); optimisation of a groundwater model (Rogers & Dowla, 1994); modelling the rainfall-runoff transformation using a combination of areal and point based measurements (Lorrai & Sechi,1995); synthesizing reservoir inflow records (Raman & Sunilkumar, 1995); modelling synthetic sequences of rainfall-runoff data (Minns & Hall, 1996); and modelling soil water retention curves (Schaap & Bouten, 1996). These powerful CI (Computational Intelligence; see Fischer & Abrahart, forthcoming) tools can be used to model raw data, trained to clone existing models, or implemented in various modes of computational association with equation-based tools. Most research has to date focused on rainfall-runoff applications that range from modelling a 5 x 5 cell synthetic watershed using inputs derived from a stochastic rainfall generator (Hsu et al., 1995), to predicting runoff for the Leaf River Basin (1,949 km2) using five years of daily data (Smith & Eli, 1995), and to constructing robust models of fifteen-minute flows with six hour lead times for the Rivers Amber and Mole (Dawson & Wilby, 1998).
There are no hard and fast rules governing the correct design of a neural network. It is axiomatic that more complex problems will require more complex solutions. However, when there are a large number of free parameters, the network will be (a) slower to train and (b) more susceptible to overfitting. Important factors such as the number of inputs, the number of hidden units, and the arrangement of these units into layers are often determined using 'trial and error' experimental design procedures (e.g. Fischer & Gopal, 1994) or fixed in advance according to the subjective opinion of each individual designer (e.g. Abrahart & Kneale, 1997). The laborious task of testing for optimum inputs and architectures can be a time consuming process and the end result will often be neither that informative nor altogether conclusive or convincing. The main aim of this investigation - from a computational perspective - was thus to investigate the use of modern technologies to build better and more efficient neural network hydrological models. In the initial stages of this emergent paradigm it is also important to examine and report on the science involved. Little hard knowledge is known about the temporal dimension of neural network rainfall-runoff modelling. So another major goal of this research was to help generate a better understanding of relevant inputs and operational considerations related to neural network rainfall-runoff modelling. These experiments were also used to provide some additional insights into the modelling process, via an explanation of the significance and power associated with the use of different inputs, in particular past river flow inputs - since this data is peculiar to neural network hydrological modelling operations.
The area chosen for this study was the Upper River Wye in Central Wales (Figure 1). This is an upland research catchment that has been used on several previous occasions for various hydrological modelling purposes e.g. Beven et al. (1984); Bathurst (1986); Quinn & Beven (1993). The basin covers an area of some 10.55 km2, elevations range from 350-700 m above sea level, and average annual rainfall is in the order of 2500 mm. Ground cover comprises grass or moorland. Soil profiles are thin, most of the area being peat, overlying a podzol or similar type of soil. Runoff response is dominated by saturated sub-surface flow, especially at the interface between the two soil layers, and by overland flow following saturation of the peat layer (Knapp, 1970; Newson, 1976).
The data that were available for this area were for Cefn Brwyn
(Gauging Station Number 55008), comprising rainfall (RAIN), potential
evapotranspiration (PET), and river flow ordinates (FLOW) on a
one hour timestep for the period 1984-6. The data were first pre-processed
into what has now become the standard format for temporal neural
network modelling. The resultant multi-column file had separate
columns for: annual hour-count (CLOCK), RAIN t, RAIN t-1 to t-6,
PET t, PET t-1 to t-6, FLOW t-1 to t-6, and FLOW t. The six-hour
historical record was considered sufficient for predictive modelling
purposes based on previous reported experiments (Abrahart &
Kneale, 1997). It also tallies with the empirical rule that at
least five or six points should be used to define the rising limb
of a finite-period unit hydrograph, which dates back at least
to F.F. Snyder in the late-1930s (Johnstone & Cross, 1949),
and is promulgated in the UK Flood Studies Report (NERC, 1975).
Given the circular nature of CLOCK these particular values were
transformed into their sine and cosine equivalents, making a total
of twenty-three variables, as shown in Figure 2. All variables
were next subjected to linear normalisation between zero (lowest
possible value for that variable in the database) and one (highest
possible value for that variable in the database). The normalised
file was then split into three individual data sets: 1984, 1985
and 1986. To help keep matters simple - all river flow values
are henceforth reported in terms of these 'normalised flow units'
Four sets of neural network models were created and tested. An initial set of models was developed using standard training procedures. These models were intended to act as an 'experimental control' or 'benchmark' against which the other three sets of models could then be compared. All models had an identical starting point and, where practical considerations allowed, the same training methods and parameters were used.
The Stuttgart Neural Network Simulator (SNNS) was used to construct a two-hidden-layer feedforward network. This network comprised a 22:16:14:1 architecture, with all standard connections enforced, and with no cross-layer connections permitted (Figure 3). The input nodes were: sin[CLOCK], cos[CLOCK], current RAIN [t], last six RAIN recordings [t-1 to t-6], current PET [t], last six PET recordings [t-1 to t-6], and the last six FLOW ordinates [t-1 to t-6]. The output node corresponded to current FLOW [t]. All connection weights and unit biases were initialised with random numbers - set between plus and minus one. The design of this network was based on earlier work where an identical architecture, that was trained on the same database, had been observed to perform in an acceptable manner (Abrahart & Kneale, 1997). This particular network formed the initial starting point for each subsequent individual investigation.
[layers are displayed in two column format]
In addition to the 'standard procedure' for training a backpropagation network, two different types of traning algorithm were used to build alternative models, resulting in numerous additional architectures being created from the initial network for each individual set of training data - as shown in Figure 4:
The Stuttgart Neural Network Simulator (SNNS) was run in batch file mode and used to perform all standard neural network modelling operations and training procedures. It was also used to implement the two automated network pruning algorithms which are both available for use as internal functions.
In the standard procedure and network pruning experiments the initial 22:16:14:1 network was trained on one annual data set and tested with the other two. This operation was in turn repeated for each of the three individual data sets and an optimal solution for each model building scenario was selected. Statistical and graphical comparisons between the various preferred neural network solutions then followed. This multiple training and testing, using data from three different hydrological periods, facilitated a number of informative comparisons - more so since 1984 was a drought year; 1985 contained a limited number of intermediate events; and 1986 had a far higher proportion of 'information rich' event-related data.
In the standard procedure and network pruning experiments all network training was undertaken using the SNNS 'enhanced backpropagation' algorithm (E-BPROP). All training patterns were presented in random order with weight updates being implemented after the presentation of each individual pattern. All batch files were set to run for 6500 epochs (training cycles). The 'learning rate' parameter was set at 0.2; the 'momentum' parameter was held constant at 0.1; the 'flat spot elimination' and 'maximum tolerated difference' parameters were kept at zero. This decision to use low levels of learning and momentum was based on earlier experiments. Too much rapid forcing is known to produce wild fluctuations that are difficult to control and is a problem that can be attributed to the poor spread of training data i.e. much of the solution surface is dedicated to modelling a flat response with intermittent storm events.
Three standard training runs were undertaken to provide an experimental
control or benchmark against which some sort of relative comparison
could be made. Each annual data set was in turn used to train
the initial 22:16:14:1 network. Sum squared error statistics were
computed for all three data sets at 100 epoch intervals and these
results were then translated into a combined graph from which
the best overall modelling solution for each annual data set could
be selected using visual inspection (Figures 5 to 7).
Three weight-pruning training runs were undertaken. Training was in all but one respect identical to that used for the standard model, the difference being that after each period of 100 training epochs the five weighted connections that had the lowest weights were deleted, which created a network that became less and less complicated over time. The idea behind this technique is that the lowest weights will be associated with the weakest connections that transmit the least significant throughputs. This connection elimination procedure was allowed to run until the network was no longer able to function. The sum squared error statistics at 100 epoch intervals are plotted in Figures 8 to 10.
Three node-pruning training runs were undertaken. However, in this instance, after each period of 100 training epochs the node that produced the least amount of overall change in the global sum squared error statistic when omitted was deleted - which again produced a network that became less and less complicated over time. The idea behind this technique is that the lowest change in error would be associated with the least significant node. This node elimination procedure was allowed to run until the network was no longer able to function. The sum squared error statistics are plotted in Figures 11 to 13.
One critical issue for the successful application of a neural network concerns the complex relationship that exists between learning and generalisation. It is important to stress that the ultimate goal of network training is not to learn or reproduce an exact representation of the training data, but rather to build a model of the underlying process(es) which generated that data, in order to achieve a good generalisation or out-of-sample performance. It is therefore important to validate the final product not in terms of its training data but in terms of its application to the other two data sets. Network training error also fluctuates quite a bit at various points during the training process, often to a marked degree, which thus renders a quantitative assessment difficult. The decision was therefore made to investigate extended runs and to undertake a visual assessment of the performance of the two validation data sets, in each model building scenario, to determine in each case a 'preferred' neural network solution. In most cases the optimal model was selected at a point where the error associated with one or other of the two validation data sets began to increase in a continuous manner and with no subsequent fallback. Vertical dashed lines denote the chosen network solutions on each of the nine training graphs depicted in Figures 5 to 13. Attention is drawn to the use of a log scale for plotting the sum squared error statistic.
[dashed vertical line indicates position of chosen model]
ENZO is a dedicated software package, that can be used to operate the Stuttgart Neural Network Simulator, and comprises a genetic algorithm tool which has been adapted for the task of neural network optimisation. Constructing a neural network solution to a given problem can be a difficult task since it involves choosing a particular network architecture (i.e. number of layers, number of units per layer, and patterns of connection), together with a set of network coefficients (i.e. weights, thresholds, etc.), that will, in combination, produce an optimal performance for a given modelling situation. All global optimisation heuristics, when faced with complex optimisation problems, must adopt a balance between the level of exploration and the level of exploitation since:
Evolution-based algorithms avoid the problem of becoming trapped in a local minimum through the use of a parallel search process, comprising a population of search points (individuals), and stochastic search steps i.e. stochastic selection of the parents and stochastic generation of their offspring (mutation and crossover). But this search procedure is nevertheless in a broad sense still biased towards exploitation since it is the fittest parents that are selected for the creation of future generations. Moreover, genetic algorithms are problem independent, and will therefore neglect vital problem dependent knowledge such as gradient information relating to the solution surface. So the use of a pure evolution-based genetic algorithm will at best produce modest results in comparison to other heuristics that can exploit this additional information. However, in the case of neural networks, each individual model is capable of moving down the solution surface gradient on its own - using standard gradient descent procedures such as backpropagation. The application of a hybrid evolution-based method can therefore enable us to restrict the search space to a set of local optima using a two phased operation:
|Level-1 heuristic:||periods that contain coarse steps based on evolution, which are intertwined with ...|
|Level-2 heuristic:||periods that contain fine steps for local optimisation|
ENZO has a lot of parameters that can be adjusted. Although this in itself might appear to pose an additional optimisation problem, implementation of sensible default values within the program makes the engineering of a poor result quite difficult, and the code is in fact quite robust. Nevertheless, given that a certain degree of modification could lead to superior model building, and that the required modifications could well be problem dependent - it might at some later date be useful to tailor various specific aspects of the algorithm in one form or another. Such items are the subject of alternative explorations and further research.
In these initial experiments a straightforward modelling operation was undertaken and various important details relating to the chosen method of application are as follows:
Resilient propagation (RPROP) was used to train each of the numerous
mutated networks. RPROP is a fast, local adaptive learning scheme,
and performs supervised batch learning in multi-layer perceptrons
(Riedmiller & Braun, 1993). It is therefore one of the best
algorithms for handling a large number of networks that need rapid
training. The basic principle behind RPROP is to eliminate the
harmful influence of the partial derivative on the weight step.
Thus, only the sign of the derivative is used to indicate the
direction of the weight update, with the size of the weight change
being determined from a weight-specific update value that is also
based on a sign-dependent process. Each time the partial derivative
of a weight changes its sign, this indicates that the last update
value was too big, and that the algorithm has therefore jumped
over a local minimum. The update value is therefore decreased.
If the derivative maintains its sign the update value is given
a small increase in order to accelerate convergence in shallow
regions of the solution surface. Since RPROP is attempting to
adapt its learning process to the error function, weight-update
and adaptation are performed after the gradient information of
the whole pattern set is computed, which means that a batch or
epoch learning process must be used. Default parameters were set
as follows: initial update-value [0.1], limit for maximum step
[50.0], and weight-decay exponent [4.0].
Six random mutation runs were undertaken based on the initial
22:16:14:1 network. Each annual data set was used in a paired
formation. Within each pair, one data set was used to perform
the Level-1 heuristic, with the other set being used to perform
the Level-2 heuristic. Network results were also calculated for
the third data set on each occasion and used for comparative purposes.
Architectural changes were restricted to (a) network pruning and
(b) single parent random mutation operations. No fittest-parent
feature implantation or direct transfer of short-cut crossover
connections were permitted. Low mutation probabilities, with an
equal split between insertion and deletion, were selected in each
case which enabled us to experience some degree of alteration
whilst at the same time maintaining a stable network configuration.
More radical mutation will be done in later experiments. The program
was in each case run for 30 generations and the fittest 100 networks
were saved to file. This made a total of 400 networks that were
tested and evaluated on each annual dataset, comprising 100 original
models and 300 (10x30) offspring, making a grand total of 2,400
networks that were examined. In the previous experiments there
were two independent data sets that could be used to evaluate
the best overall network solution. However, this operation could
not be repeated on the final population, or at least could not
be done in the same manner because two full data sets had been
involved in the model construction process. So at best there was
now just one independent data set that could be used for model
evaluation purposes - although it was apparent in the previous
work that the difference in performance between one river flow
data set and another was such that one set on its own could not
give an adequate representation. Given this situation, together
with the experimental nature of this project, it seemed sensible
to retain the use of the internal fitness measure for determining
which was the best overall network model. The fittest member was
therefore selected from each final population, for further investigation
of its architecture, statistics and hydrographs, and for comparison
with the optimal solutions derived from the other training exercises.
To provide a program validation check on all members of each final
population, the six hundred saved networks were all tested in
terms of sum squared error related to the three annual data sets,
and no major discrepancies were identified.
Simple summaries for the purpose of assessment can often be useful.
So a straightforward and unpretentious counting exercise was therefore
performed on the different network architectures that comprised
the chosen pruning algorithm and genetic algorithm solutions.
The collated information is provided in counts relating to the
number of nodes per layer, the total number of connections, and
the percentage of original items remaining. These statistics are
reproduced below in Tables 1 and 2.
Extensive reduction in the original network architecture was produced from the automated application of both network pruning algorithms (Table 1). Marked differences were also observed to have arisen in the number and distribution of units, between the two different methods of pruning, and between the three different sets of training data. The input layer saw the greatest variation with the number of input units in the final solutions ranging from 3 to15 units. The hidden units maintained a more balanced profile. Both hidden layers suffered losses of a similar number, with final numbers ranging from 9 to14 units in the first hidden layer, and from 6 to11 units in the second hidden layer. The total number of connections in each solution also exhibited considerable variation ranging from 64 to 258, and there appeared to be no explicit relationship between (a) the number of units in each layer, and (b) the total number of connections.
|Magnitude based pruning|
|Magnitude based pruning|
|Magnitude based pruning|
The automated application of the combined pruning and genetic algorithm procedures also produced extensive reductions in network architecture with marked differences in the number and distribution of units depending upon which combinations of data were used (Table 2). The most striking feature in this table is that the fittest networks all contained a full set of input nodes. Program records indicate that a limited amount of input node mutation occurred - but whether or not this aspect of the result is an outcome of low mutation probabilities, improved fitness performance from multiple inputs, or a spurious artifact associated with the training programme and node insertion procedure is for the moment unknown. The final outcome for both hidden layers, in contrast to the input layer, shows a massive reduction in the number of hidden nodes and a modest degree of between-network variation. Final numbers range from 2 to 12 units in the first hidden layer, and from 2 to 5 units in the second hidden layer, with most counts being much lower than those reported in the earlier experiments. The total number of connections in each solution also exhibited considerable variation ranging from 49 to 269 and again there appeared to be no explicit relationship between (a) the number of units in each layer and (b) the total number of connections. The counts in this instance were similar to those reported in the earlier experiments.
|Training data used in RPROP|
From the three network architecture diagrams (Figures 14-16) it can be seen in all cases that magnitude based pruning brought about a massive reduction in the number of weighted connections whilst at the same time maintaining a reasonable number of hidden units. In so doing this algorithm has created a much reduced network - which from a neural network perspective - has a rather simple looking structure and would require much less time and effort to train and run. Moreover, in all cases, it is evident that numerous input links have been maintained with the most recent past river flow value which has the greatest number of connections (FLOW t-1). Likewise all three networks have maintained several input links with current rainfall (RAIN t). What remains of the other input links from nodes associated with earlier FLOW and RAIN data is less clear cut and there appears to be some degree of variation from network to network - although the main focus of the networks is nonetheless on maintaining some link with previous FLOW and RAIN inputs. The 1984 model had a lot of these additional RAIN and FLOW links; in the other models this association was less marked. Models built on 1985 and 1986 data had no input connections with PET or CLOCK while the 1984 network maintained a link with both.
= active node = inactive node
= active node = inactive node
= active node = inactive node
As in the case of magnitude based pruning, skeletonization also reduced the number of weighted connections whilst maintaining a reasonable number of hidden units (Figures 17-19). In contrast to magnitude based pruning this algorithm has created networks that have similar, or fewer inputs, although the final number of connections is much greater. Further details are provided in Table 1. From a neural network perspective these reduced networks would also require much less time and effort to train and run. Moreover, in all cases, several links have been maintained with the two most recent past river flow records. The 1985 and 1986 models also have links with other past river flow records; whereas the 1984 model does not. The models for 1985 and 1986 have maintained links with current rainfall (RAIN t); 1986 also with RAIN t-1. No other rainfall links exist. The 1984 and 1985 models both have links to CLOCK; and the 1986 model has links with past PET values.
= active node = inactive node
= active node = inactive node
= active node = inactive node
The output from these experiments is more difficult to interpret
(Figures 20-25). Again, in a similar manner to the earlier pruning experiments,
the applied combination of evolution-based model breeding and
hard pruning brought about a massive reduction in the number of
weighted connections. But, as explained earlier, in sharp contrast
to the previous experiments all input units have been maintained
- albeit with varying degrees of connection. The question of input
relevance must therefore be determined from an examination of
the connection patterns alone. The number of hidden units shows
a more substantial variation between the least complex 22:2:2:1
and most complex 22:12:5:1 solutions. With the RPROP algorithm,
using 1984 training data, both network results contained a large
number of connections most of which were associated with RAIN
and FLOW inputs. PET inputs, likewise, had a substantial number
of links while CLOCK inputs were less pronounced. There is little
or no real difference between the two network architectures. With
the RPROP algorithm, using 1985 training data, both networks exhibited
a state of almost full connection and no differentiation in terms
of input relevance can be made. The two networks, however, do
in fact look quite different as a result of massive differences
in the number of hidden nodes with the 22:6:4:1 network having
a substantial number of connections and the 22:2:2:1 network having
just a few. With the RPROP algorithm, using 1986 training data,
both networks contained a substantial number of connections. The
1984 network contained a large number of connections most of which
were associated with RAIN and FLOW inputs. PET inputs, likewise,
had a substantial number of links. CLOCK inputs were again less
pronounced. The 1985 network exhibited a state of almost full
connection with little overall differentiation. This network also
has more hidden units in the first hidden layer which means that
it had a more substantial system of inter-connected parameters.
using 1984 (training) and 1985 (fitness evaluation) data
= active node = inactive node
using 1984 (training) and 1986 (fitness evaluation) data
= active node = inactive node
using 1985 (training) and 1984 (fitness evaluation) data
= active node = inactive node
using 1985 (training) and 1986 (fitness evaluation) data
= active node = inactive node
using 1986 (training) and 1984 (fitness evaluation) data
= active node = inactive node
using 1986 (training) and 1985 (fitness evaluation) data
= active node = inactive node
One major problem in assessing neural network solutions is the use of global statistics. When neural networks are used to model one-step-ahead predictions the solution will in most cases produce a high or near-perfect goodness-of-fit statistic. All such measures therefore give no real indication of what the network is getting right and wrong or where improvements could be made. Indeed, neural networks are designed to minimise global measures, and a more appropriate metric that identifies real problems and between-network differences is now long overdue. But most other river flow prediction tools also suffer from the same problem, so until such time as a recognised solution is available, one or more simple measures must suffice. Since there is no one right method or definitive evaluation test a multi-criteria assessment was therefore carried out. Eight global evaluation statistics were applied to each output and a brief description of each statistic is given below:
Test statistics related to these modelling operations are provided in Tables 3a to 3c. Each table pertains to one annual test data set and from an examination of these tables it is apparent that there was no one best overall solution. The best result on each table for a given test statistic produced with a given set of validation data has been coloured red i.e. the best validation result for the minimum error statistic produced with a network tested on 1984 data is -0.05744 nfu. Looking at the pattern of best performing statistics for each different type of model enables three main points to be extracted. First, the overall level of prediction is quite similar, and that all models have produced good results. There is no strong evidence of overfitting which therefore validates the method of selection. Second, there is no single outright winner. The different models would appear to have different qualities; so in all cases the criteria for selection must be determined according to the task in hand, and the use of alternative objective functions should be considered e.g. those which are specific to reservoir management, flood forecasting, or habitat preservation purposes. Third, the different training sets contained different types or amounts of information, and thus produced different levels of generalisation for each given situation. This problem would also have an important influence on the use of individual modelling solutions that were created for a given period, or point in time, and then applied to another one which is in fact temporal extrapolation.
|Network training data:||Network training data:||Network training data:|
|Network training data:||Network training data:||Network training data:|
|Network training data:||Network training data:||Network training data:|
Test statistics related to the evolution-based approaches are listed in Tables 4a to 4c. Each table contains the test statistics relating to one annual test data set from which it is again apparent that there was no one best overall solution. It must be stressed at this point that the figures from these tables cannot be used for the purpose of a direct comparison with those provided in the earlier exercise. In the initial exercise an attempt was made to find the optimal modelling solution whereas in these follow-on genetic algorithm exercises the aim was to find an optimal architecture based on a mean error stopping condition of 0.0005 nfu. The two operations would therefore be expected to produce different levels of generalisation because the networks that were involved had been trained to produce different levels of output error. However, in other respects, the pattern of variation in the test statistics is similar to that which was produced in the earlier tables and so the three general observations that resulted from an examination of Tables 3a to 3c can also be applied to these figures.
All of the above statistics required some form of reduction. A simple scoring system (that could have been weighted in some manner of other) was first devised for the rapid identification of the two 'best' overall modelling solutions. On each individual table a score of one was given to the preferred or fittest solution for each best performing 'fitness evaluation' or 'unseen validation data' assessment statistic. No mark could be awarded for those scenarios in which the testing data were identical to those on which the network was trained. This procedure therefore equates to one mark being awarded for each statistical measure per annual data set. These values are coloured red on the error statistic tables and were subject to an earlier more detailed explanation. Final marks are provided in Tables 5 and 6. To accommodate further enlightenment these scores are also summed to provide 'per training data' and 'per modelling process' totals.
|Magnitude based pruning|
In addition to the highest performer it is also useful to investigate the level of variation that has been exhibited for each data set between the different neural network solutions. High levels of variation would reflect marked differences in a test statistic which is indicative of dissimilar generalisation or poor modelling capabilities. This could be applicable on an annual basis, on a more localised event-type basis, or on some combination of both. Further investigation of the output data would be required to provide a definitive answer to this question. Two further tabulations have therefore been constructed for the provision of appropriate between-model variation measures which in this instance have been computed and reported using the Coefficient of Determination i.e. standard deviation expressed as a percentage of the mean (Table 7). Numerous differences in variation were observed to exist within the COD statistics which ranged from a minimum of 2.10 % to a maximum of 129.13 %. S4E statistics exhibited the greatest level of variation and these numbers have been coloured red. %COE statistics exhibited the least amount of variation and these numbers have been coloured blue. Important patterns can also be observed within the variation statistics, between the various annual data sets related to each method, and between each annual data set for one method and its counterpart in the other.
Time series plots of forecast and actual flows were inspected for bias in network performance. It is important to determine whether the computed global measures applied to all stages of flow and in different seasons because it is possible to get significant statistical relationships on long time series, where the low flows are modelled in an accurate manner, but where high flows are in error. These visual plots were also used to check for a consistent temporal response. It was anticipated, for example, that there could be greater errors in forecasts for winter snowmelt events which are rare occurrences in the training data. It was also important to check: (a) the timing of events; (b) the accuracy of the starting point of rising flow; and (c) the ability to model peak discharge in terms of both time and volume.
Three representative hydrographs that are considered to provide some typical illustrations of the overall results were selected for graphical presentation (Figure 26[a - c]). These graphs contain information on three different individual 50 hour periods taken from 1986. Each individual hydrograph depicts the output response to a different type of situation: [a] low flow; [b] medium flow; and [c] high flow events. The three periods are not connected to each other in time but do however occur within a similar hydrological season and form part of a longer sequence of storm events. The two chosen models were those that produced the best overall score per model building exercise. Each graph contains three plots. Blue lines represent actual river flow values, red lines represent predictions associated with the best performing pruned network model (magnitude based pruning / 1985 training data), and green lines represent predictions associated with the best performing genetic algorithm network model (1984 fitness evaluation data / 1986 training data). The decision to use 1986 test data in these plots was based on three things. First, it was important to use data that had a high 'information content'. Second, to use data on which an optimised solution had been developed would be to use an unrepresentative sample. Third, to minimise exogenous factors, it was important to have a near temporal juxtaposition of storm events, comprising low flow, medium flow and high flow situations.
Each individual graph contains three plots: comprising real values (blue); highest scoring pruning algorithm model predictions (red); and highest scoring genetic algorithm model predictions (green).
A series of initial investigations have been performed using automated approaches as an aid to resolving the complex task of designing an optimal neural network architecture for rainfall-runoff modelling but the results were inconclusive. This was not surprising, given the nature of neural network modelling, wherein multiple solutions of a similar but not identical nature will be the most probable outcome of all such investigations. However, in the exploration process a lot of useful basic knowledge about the need for simple architectures has been gained. In addition, there is now some hard evidence to support previous suppositions, about which inputs were thought to have the most important influence.
Different modelling operations resulted in different network architectures. One possible interpretation of the different architectures that were produced from the pruning exercises is that models trained on poorer data sets are having to look beyond the basic rainfall-runoff information in order to create a reasonable solution surface and that this requirement varies both within and between the individual data sets. Both pruning results also lead us to conclude that different factors are being taken into account due to differences in the function that is being modelled and that this problem will therefore be reflected in the form of poor results when attempting to transfer these final models from (a) testing with the original training data to (b) testing with validation data related to a different period in time. Despite all of the networks having produced a reasonable set of results, major variation in network complexities existed, which is another important feature that has emerged from these investigations. This is both of scientific interest and a possible cause for concern. From a positive standpoint these experiments have been investigating items at the 'mesostructure' scale i.e. the manner in which the neural network is organised, including such features as the number of layers, the connection patterns, and the flow of information. But, perhaps, the 'microstructure' is also having a strong influence e.g. the processing characteristics of each node within the network. So these two levels should therefore be examined together in some linked manner. From a negative standpoint, given the overall lack of a consistent result, it is possible that no real optimal solution can be achieved and that what appear to be improved architectural solutions are in fact just manifestations of a random sampling process with no hidden meaning in the arrangement of the network nodes and weights. These differences were taken to extreme measures in the model breeding experiments where one solution had 4 hidden units and 49 weighted connections (Figure 23) and another has 17 hidden units and 269 weighted connections (Figure 25). However, it is also possible to conclude from these results that the exact intricacies of the architecture is not ultra important, which in turn suggests that less effort should be expended on searching for an optimal solution when for most practical purposes a simple sub-optimal solution would be sufficient for the task in hand and a lot quicker to obtain. More radical and extensive analytical experimentation, coupled with a more detailed internal inspection of the final models, is therefore required to test this hypothesis.
Statistical assessment was problematic and no one definite winner could be established. Table 5 indicates that the outright favourite from networks created using standard procedures is magnitude based pruning trained on 1985 data. This solution was awarded ten out of the twenty-four marks. It was also a clear leader in terms of both training data and modelling process scores. Table 6 indicates that a clear favourite from networks that were created using random mutation with resilient learning and hard pruning is RPROP training using 1986 data coupled with GA fitness evaluation using 1984 data. This solution was awarded thirteen out of the twenty-four marks. It is interesting to note that for training purposes the 1986 data set now moves into first position whereas the 1985 data set was the best performer in the earlier pruning exercises. The successful use of the 1984 data set for network fitness evaluation and model breeding purposes is however more controversial since this data is known to be 'information poor' and, all other things being equal, should therefore have given the weakest performance. However, irrespective of these difficulties, the integrated combination of batch training, fixed stopping condition, and fitness evaluation has somehow or other not just managed to create a reasonable modelling generalisation - but one which is also transferable to alternative data sets. Table 7 also contains some important information. High values in this table for a particular method indicate variable results. S4E, which was highlighted in red, in all but one instance exhibited the greatest degree of variation which is interesting because it is this statistic that places particular emphasis on the model fit at peak flows. This means that it is the fit of the various neural network models to such phenomena which exhibits the greatest amount of variation across the numerous different solutions. %COE, which was highlighted in blue, produced the least amount of variation per test data set and has therefore been unable to offer sufficient differentiation between the numerous neural network models. In various instances, marked similarities can also be observed between the results obtained from testing with 1985 and 1986 data, and marked differences likewise observed between these two results and those obtained for 1984. Some important between-method comparisons can also be made for various statistical measures related to each annual data set e.g. there are substantial differences associated with the results for 1984.
Graphical output associated with the highest scoring pruning algorithm solution [PAS] and the highest scoring genetic algorithm solution [GAS] was provided in Figures 26 [a-c]. These three hydrographs contain a wealth of additional information about the underlying functions that are being reproduced, and a clear indication of the output response that is associated with a given situation, for each individual model. In all three temporal windows it can be seen that most low flow situations have been modelled in a reasonable manner. PAS and GAS produce similar and accurate results for this section of the solution surface. Nevertheless, even at low levels of flow, when the level of flow is falling GAS generates what could at best be described as a noticeable number of underpredictions. Next, looking at the small to medium events in hydrographs [a] and [b], these are also observed to be modelled in an acceptable manner, but with clear problems occurring in the timing and magnitude of peak flow predictions. PAS generates greater peak flow errors and these predictions are all late. GAS is therefore the better model in medium event situations - which is the exact opposite of that observed in falling low-flow instances. Last, looking at hydrograph [c], this is a peaked high flow event, and the relative difference between the various models produces a spectacular plot. Neither of the two models performed particularly well. PAS was the better of the two with reasonable timing but poor level prediction. GAS produced a broad near-flat peak that fell well short of the high levels required. From these various simple observations it is therefore apparent that the three hydrographs, together, confirm the statistical results. Low flow and limited change situations are modelled quite well. Peak flow events are not modelled that well and considerable variation is observed to exist between the different modelling solutions and the manner in which these items are modelled. Moving from description to explanation, it must of course be remembered that a direct comparison between the PAS and GAS models is limited because one model was produced from a desire to create an optimal solution, whereas the other model was produced from a desire to create an optimal architecture based on a fixed level of error. But one is still forced to wonder about the extent to which these various differences in output can be attributed to differences in the method of model creation. For example, did the use of a batch update procedure and a fixed stopping condition prevent the GAS model from producing more accurate high flow prediction? Questions of this nature are the subject of further research.
The discussion thus far has focused on neural network aspects of river flow prediction. Although much of the reported work has been of a computational nature it is also important to view these results from a hydrological perspective. Simple neural network models can be produced and evaluated using automated techniques. In this work several thousand models were created and tested using batch programs and overnight runs. This method of creating models has clear cost-benefit implications for model development and model application times. Whereas each neural solution is in fact just a combination of simple processing elements and weighted connections the power of these interconnected processing elements to act in concert and produce complex non-linear models is tremendous. In the reported experiments acceptable results were produced from a limited number of input measurements. But these computational devices can also perform data fusion operations using different types of data, from various sources, and at different resolutions - which is something that was hitherto more or less unthinkable. Such capabilities have clear implications for the collection or purchasing of useful input data which is often not available, or if available, is in an inappropriate format. Most existing hydrological models also focus on peak flow prediction. However, whilst offering a complete hydrological modelling solution, the neural networks were also found to be excellent low flow predictors - which is of particular merit for water resource applications in drought prone regions. Other areas in which good low flow predictions would be beneficial might include reservoir management in drought periods or semi-arid locations, river balance planning and water supply operations, designing irrigation and water extraction regimes, or to promote various ecological interests and aesthetic considerations.
Simple iterative learning, which is the main network optimisation tool that has up until this time been used in most geographical modelling operations, was extended in this research to create a more complex procedure that focused on the progressive removal of 'unimportant components' in a destructive cycle of training and pruning. To this operation network reconstruction sequences and fitness testing using alternative criteria were then added to create a powerful automated model building environment. In certain instances there was evidence to suggest that a more suitable network architecture, which had improved generalisation capabilities, had been found. However, in all cases there was a substantial reduction in network architecture, which produced neural network models that had fewer computational overheads. The removal of various non-essential inputs, which has clear implications for data collection requirements and information processing times, was another characteristic of the pruned networks. Further extended and more radical evolution-based river flow forecasting and prediction modelling investigations are now planned.
There is still no reliable scoring system in existence that can overcome the difficulties of measuring peaks and troughs or to perform event-based separation of the appropriate statistical descriptors. Multi-criteria evaluation, with appropriate weightings based on specific end-user requirements, offers one possible method through which this goal can be achieved. But the application of all such subjective approaches must be looked at in a rigourous and comprehensive manner. There is also a pressing need for the creation of dedicated software programs that can perform multi-criteria assessment - perhaps in a interactive manner or with direct links to an EDA (Exploratory Data Analysis) toolbox.
Neural networks work. These computational tools offer great scientific promise and real practical benefits - but there is still a great deal more exploration that needs to be undertaken on the use of these tools and their application to solving different types of practical problems in different areas of geographical research. There is also a need to examine the available options for building better networks and to investigate the different methodologies for their efficacious application.
Abrahart, R.J. 1998. "Neural networks and the problem of accumulated error: an embedded solution that offers new opportunities for modelling and testing". Proceedings Hydroinformatics'98: Third International Conference on Hydroinformatics, Copenhagen, Denmark, 24-26 August 1998.
Abrahart, R.J. and Kneale, P.E. 1997. "Exploring Neural Network Rainfall-Runoff Modelling". Proceedings Sixth National Hydrology Symposium, University of Salford, 15-18 September 1997, 9.35-9.44.
Abrahart, R.J. and See, L. 1998. "Neural Network vs. ARMA Modelling: constructing benchmark case studies of river flow prediction". Proceedings GeoComputation'98: Third International Conference on GeoComputation, University of Bristol, United Kingdom, 17-19 September 1998.
Bathurst, J. 1986. "Sensitivity analysis of the Systeme Hydrologique Europeen for an upland catchment", Journal of Hydrology, 87, 103-123.
Beven, K., Kirkby, M.J., Schofield, N. and Tagg, A.F. 1984. "Testing a physically-based flood forecasting model (TOPMODEL) for three U.K. catchments, Journal of Hydrology, 69, 119-143.
Blackie, J.R. and Eeles, W.O. 1985. "Lumped catchment models". Chapter 11 in: Anderson, M.G. and Burt, T.P. Eds. 1985. Hydrological Forecasting. Chichester: John Wiley & Sons Ltd.
Dawson, C.W. and Wilby, R. 1998. "An artificial neural network approach to rainfall-runoff modelling", Hydrological Sciences Journal, 43, 1, 47-66.
Fischer, M.M. and Abrahart, R.J. (forthcoming)."Neurocomputing - Tools for Geographers". Chapter 8 in: Openshaw, S., Abrahart, R.J. and Harris, T.E. Eds. GeoComputation. Reading: Gordon & Breach.
Fischer, M.M. and Gopal, S. 1994. "Artificial neural networks: a new approach to modelling interregional telecommunication flows", Journal of Regional Science, 34, 503-527.
French, M.N., Krajewski, W.F. and Cuykendall, R.R. 1992. "Rainfall forecasting in space and time using a neural network", Journal of Hydrology, 137, 1-31.
Hsu, K-L, Gupta, H.V. and Sorooshian, S. 1995. "Artificial neural network modeling of the rainfall-runoff process", Water Resources Research, 31, 10, 2517-2530.
Johnstone, D. and Cross, W.P. 1949. Elements of Applied Hydrology. New York: Ronald. Cited in: Minns, A.W. and Hall, M.J. 1997. "Living with the ultimate black box: more on artificial neural networks", Proceedings Sixth National Hydrology Symposium, University of Salford, 15-18 September 1997, 9.45-9.49.
Karunanithi, N., Grenney, W.J., Whitley, D. and Bovee, K. 1994. "Neural Networks for River Flow Prediction", Journal of Computing in Civil Engineering, 8, 2, 201-220.
Knapp, B.J. 1970. Patterns of water movement on a steep upland hillside, Plynlimon, Central Wales, Unpublished PhD Thesis, Department of Geography, University of Reading, Reading.
Kohonen, T. 1995. Self-Organizing Maps. Heidelberg: Springer-Verlag.
Lorrai, M. and Sechi, G.M. 1995. "Neural nets for modelling rainfall-runoff transformations", Water Resources Management, 9, 299-313.
Minns, A.W. and Hall, M.J. 1996. "Artificial neural networks as rainfall-runoff models", Hydrological Sciences Journal, 41, 3, 399-417.
Newson, M.D. 1976. The physiography, deposits and vegetation of the Plynlimon catchments, Institute of Hydrology, Wallingford, Oxon. Report No. 30.
NERC (Natural Environment Research Council). 1975. Flood Studies Report, Vols 1-5. London: Natural Environment Research Council. Cited in : Minns, A.W. and Hall, M.J. 1997. "Living with the ultimate black box: more on artificial neural networks", Proceedings Sixth National Hydrology Symposium, University of Salford, 15-18 September 1997, 9.45-9.49.
Openshaw, S. and Openshaw, C. 1997. Artificial Intelligence in Geography. Chichester: John Wiley & Sons Ltd.
Quinn, P. F. and Beven, K. J. 1993. "Spatial and temporal predictions of soil moisture dynamics, runoff, variable source areas and evapotranspiration for Plynlimon, Mid-Wales", Hydrological Processes, 7, 425-448.
Raman, H. and Sunilkumar, N. 1995. "Multivariate modelling of water resources time series using artificial neural networks", Hydrological Sciences Journal, 40, 2, 145-163.
Riedmiller, M. and Braun, H. 1993. "A direct adaptive method for faster backpropagation learning: The RPROP algorithm". In Proceedings ICNN'93: IEEE International Conference on Neural Networks, 1993.
Rizzo, D.M. and Dougherty, D.E. 1994. "Characterization of acquifer properties using artificial neural networks: Neural kriging", Water Resources Research, 30, 2, 483-497.
Rogers, L.L. and Dowla, F.U. 1994. "Optimization of groundwater remediation using artificial neural networks with parallel solute transport modeling", Water Resources Research, 30, 2, 457-481.
Schaap, M.G. and Bouten, W. 1996. "Modelling water retention curves of sandy soils using neural networks", Water Resources Research, 32, 10, 3033-3040.
Smith, J. and Eli, R.N. 1995. "Neural-Network Models of Rainfall-Runoff
Process", Journal of Water Resources Planning and Management,
121, 6, 499-509.