Automatic Analysis Rule Generation Using Genetic Algorithms

Shane Murnion1 and Steve Carver2
1Department of Geography, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth PO1 3HE, United Kingdom
2School of Geography, University of Leeds, Leeds LS2 9JT, United Kingdom

One of the major defects in the functionality of current GIS is the lack of tools which can be used for exploratory analysis of the relationships that may exist within datasets. Conventional statistical methods such as regression and principle components analysis can be used for this purpose, but there are limitations on the effectiveness of these methods. Firstly if the user is inexperienced in statistical methods it is possible that an inappropriate method will be chosen and used. Secondly, even an experienced analyst must make assumptions about the data, i.e. whether it obeys a normal distribution, gaussian distribution, etc., which makes working with small datasets problematical. Thirdly assumptions have to be made about the nature of the relationship between the datasets, i.e. whether or not the relationship is linear or non-linear, first order, second order etc., or some combination of these. Finally a good understanding of the nature of the data and the independence of any variables used is also required for accurate statistical analysis which causes difficulties in spatial analysis where the variables used are often highly correlated. It is the complexity arising from these obstacles that makes expert systems so appealing as guides in solving difficult problems. Unfortunately the creation of expert systems can also be a difficult and time consuming process. One of the major problems that exist in creating expert systems is the need to explicitly define the rules which embody the knowledge base of relationships that connect different datasets. Recently unconventional methods involving neural algorithms have been introduced which can automatically detect and model such relationships. Neural networks have many advantages in this type of analysis, they do not require knowledge of the statistical distributions of the data, they can be applied to small datasets, they can handle certain types of noisy data and no assumptions about the nature of the model need to be made. When using a neural network it is not even necessary to know which variables are important since the neural network can during training recognise the important relationships automatically. However there is one major problem with using a neural network in this fashion, neural algorithms usually create `black-box' models i.e. it is often impossible to obtain information about the model contained within the trained neural network. In this paper another unconventional analysis method is examined which offers some of the advantages of a neural algorithm, but produces a 'white-box' model which is accessible to the user. This work shows how genetic algorithms can be used to extract an appropriate rule set that explicitly states the relationship between potential input data and a desired map output result. A simple example analysis is considered, defining suitable sites for dumping radioactive toxic waste from appropriate input maps. Using an example result map it is demonstrated how the genetic algorithm can find the correct sequence of operations that link the output result map to the input data. The rules produced by this method are accessible to the user and once created can be applied to new datasets using conventional analysis methods. The method shown here has a dual potential as a data-mining method and also as a way of automatically creating expert systems.