James Macgill, Stan Openshaw,
Department of Geography,University of Leeds, LEEDS LS2 9JT, United Kingdom
Email: [j.macgill, stan]@geog.leeds.ac.uk
Artificial Life is becoming a topic of major international interest in a wide variety of fields, yet is largely unknown in the social sciences. This paper looks at one investigation into the nature and utility of Artificial Life in an exploratory geographical analysis context. Attention is focused on the application of ideas related to flocking behavior and swarm optimisation to develop new types of spatial analysis technology.
The well documented 'data explosion' that has happened since the GIS revolution of the mid 1980's (Openshaw 1995), due partly to the ability to link many databases to geography through postcodes (Raper 1992) has resulted in the rapid build up of data that is, in a large number of cases, simply being archived rather than analysed. The lack of suitable tools has resulted in the under or non- analysis of many key spatial data sets. Additionally, the perceived complexity of the geographical analysis task is so great that there has been a reluctance to even attempt the analysis of many potentially important databases. However, this expansion in the availability of data is not restricted to the field of Geography. The widespread use of Bar-codes, computerization of almost every transaction, the introduction of customer 'loyalty' cards, and the creation of many huge scientific and engineering databases has generated an urgent need for new techniques and tools that can intelligently transform processed data into useful information and knowledge (Chen 1996).
A new field has developed around the need to make some use out of this accumulation of data, that of 'Knowledge Discovering in Datasets' (Fayyad 1996). The ultimate aim of KDD is to develop new technologies able to automatically sift though this data and provide its users with new, meaningful and useful knowledge. Chen (1996) points out that whilst it would be ideal for there to be an all purpose knowledge discovery system, the wide range of data types and diversity of data mining goals of different fields mean that it is only practical for specific data mining systems to be constructed for knowledge mining on specific kinds of data, such as relational databases, transaction databases, spatial databases, multimedia databases, etc.
It is here that this research steps in. At present KDD research is targeted somewhat more at the requirements of relational databases and transaction databases, with very little aimed at the specific problems and end goals presented by spatial databases. In the field of spatial databases the geographical analysis and explanation machines developed by Openshaw et al have proven themselves to be extremely powerful and useful tools. However, as data expands in both size and number of dimensions their exhaustive 'search everywhere' techniques become very computationally prohibitive. In fact the GAM and GEM geographical data mining tools only really work well in explorations of two dimensional map spaces. The GAM scales well with increasing data set sizes but cannot handle time, while GEM can handle large numbers of observations but not a large number of map . Therefore newer and smarter methods that are able to find and concentrate on smaller more 'interesting' regions of the data, cutting down on the amount of computation required, are needed. One methodology is to develop 'smart' exploration strategies based on ideas from the field of Artificial Life that would be able to perform far more intelligent and hence more focused searches enabling it to perform in higher dimensional map spaces.
Artificial Life, in common with many Artificial Intelligence techniques, is receiving a lot of attention in a wide variety of fields, yet so far applications of this exciting, flexible and potentially very powerful paradigm are still extremely rare. The field of AL encompasses the development of computer programs, and in some cases robots, that try to reproduce or mimic the behaviour of life (either as it is or as life 'could be') see Langton (1987). There are two main perspectives behind the development of Artificial Life technologies, the first is from a biological stand point where the objective is to study Artificial Life in order to gain an insight into life as it exists on earth. The second reason for research is to see if any of the strategies and behaviours evolved by biological life forms can be reproduced and adapted in a way that is of use to scientists. It is this latter reason that is of particular interest here. The challenge for geographical analysis is to see whether more of the AL ideas can be used in a GIS environment to develop "smarter" and hence more efficient forms of exploratory spatial analysis that can cope with the size and complexity of the modern GIS world.One of the key Artificial Life technologies to be examined and adapted for this work is based on the behaviour of flocks and swarms. This paper discusses this approach and how the emergent behaviour of interaction between flock members might be used to form an effective search strategy for performing exploratory geographical analysis. The idea is simple enough. You set loose a flock of circles of different sizes that test the area they contain for evidence of localised geographic patterning. The paper describes various ways of achieving this result. The best circles then act as attractors for all the others. The entire flock then moves towards the current best results but at different velocities. The basic Reynold (1987) flock has been modified to perform a swarm optimisation relevant to geographical analysis. In particular it is important to allow multiple different pattern optima to appear. Additionally, experiments are reported which describe the effects of using warning beacons to repel flocks from poor areas that have already been explored and to investigate various hybrid search mechanisms.
The nature of systems with emergent behaviour is that it is hard to predict exactly how the system will function for a given situation in advance. In order to carry out experiments and develop strategies for improving performance a visualisation tool was required which could show the flock-based search in action. The paper briefly describes the tool that was developed to do this using Java and the advantage Java gives when it comes to distributing results generated by the search process.
Finally, the method is demonstrated on some real data sets and the results compared with GAM. There is a brief discussion of how other artificial life technologies may be used to further develop this approach to exploratory geographical analysis.
One topic of spatial analysis research that has attracted some attention, as well as some initial solutions, is that of spatial cluster analysis. The use of clustering algorithms has been hampered in non spatial fields by the difficulty in defining meaningful natural similarities between data. However, in geography the similarity to cluster is implicit in the spatial nature of the data. As it is felt that the problem of spatial cluster analysis has not yet been sufficiently 'solved', it seemed an appropriate task to set for Artificial Life to tackle.
|1a 'Simple' Cluster analysis||1b Cluster analysis with noise|
|1c With a background population||1d And now with attributes|
Figure 1 is an illustration of increasingly harder levels of cluster analysis problems. There exist a number of techniques for dealing with 'simple' cluster analysis (fig 1a) where the objective is simply to find occurrences of a phenomena, and many of these, including those employing image processing, mathematical analysis and neighbourhood analysis manage to a lesser or greater extent to deal with the second level (fig 1b) where in addition to the clusters to be found there is some noise in the data. Virtually all techniques however fail or at least struggle when faced with a background population from which to find the clusters (fig 1c) Here the problem is not just to find clusters, but to find clusters that stand out in relation to their surrounding population. In (fig 1c) for example the central cluster from 1 & 2 is probably no longer of interest as it lies in the centre of a dense population. The final level of cluster analysis (fig 1d) is when the problem moves into multiple attribute/time space, here we are in relatively uncharted territory (at least as far as spatial analysis is concerned.
Kaufman (1990) details two methods for cluster analysis; the first, Partitioning Around Mediods (PAM), partitions a set of objects into k clusters by finding k objects to represent the heart of each cluster (mediods). The remaining objects are then assigned to their closest mediod. It is only recommended for use on small datasets, The second method CLARA uses random sampling of the data to allow it to be used for far larger data sets. Cluster analysis techniques such as those described above have been applied, without too much success to general data mining and machine learning as there is difficulty in defining natural notations for similarity. Ng (1994) points out that for purely spatial analysis there exists an implicit natural notation of similarity as we can simply use distance notations (e.g Euclidean or Manhattan Distances). In order to make best advantage of this Ng goes on to develop a new clustering method derived from PAM and CLARA, called CLARANS in two varieties; one Spatially Driven and one Non Spatially Driven.
Ester (1996) looks at existing clustering algorithms including those mentioned above and sets out the following requirements for cluster analysis of large spatial data sets:
All the existing conventional methods are limited when it comes to spatial analysis especially as whilst some of them are able to deal with second level problem (noise)none are able to go on to the problem of a background population. One of the best techniques available at the moment for this kind of work is GAM.
Fig 2. Brute force search everywhere GAM approach
Fig2 is illustrative of the kind of thing GAM does. In this example one circle size is used and is shown sweeping from bottom left to top right. Regions that GAM might find as significant are marked in yellow and left behind. It is interesting to note that GAM would probably not identify the central region as, although it has a number of cases, they are found in a densely populated region and it is therefore probably not unusual to find so many cases located there
Fig 3 At multiple scales
The animation above shows the importance of searching at multiple scales. Pattern is scale and aggregation dependent and different patterns may well be expected at different scales.
The choice of circle size is critical to the outcome of the search and indeed it is unlikely that any one circle size will reveal all of the pattern. For this reason GAM actually repeats the search using a large number of different circle sizes. Fig 3 shows how the results of five searches using different circle radii can produce different results, note how GAM manages to ignore the central cluster in all but one of the scales.
Despite its usefulness, there is a need for more flexible and efficient exploratory geographical analysis methods relevant to GIS. The principal problem is that the non-intelligent (brute force) search used in GAM does not scale to higher dimensional data.
Once you introduce time into the analysis so that the search is for localised space-time clustering then the problems of using a brute force search become instantly apparent. For instance, whilst GAM can speedily generate and test many millions of circles in a few seconds, if each circle has 2m-1 possible permutations of alternative groupings or divisions of the time element, then compute times can increase by the order of four magnitudes (depending on the time granularity of the data). GAM now needs an exceedingly fast High Performance Computer. Then there is the extra difficulty that it is no longer so easy to test for localised unusualness. Similar problems can be found if clustering is looked for in space and by type. Openshaw 1994,5) argues that why should we strangle the data we wish to speak and tell us where the strongest patterns are located by imposing inappropriate time and / or case classifications on it due to our ignorance. If exploratory analysis and data mining methods are ever to achieve their full potential then they need to be able to explore data in all its complexity without it being innocently interfered with by pattern and process ignorant researchers.
It would appear that whilst computational power is still increasing, our ability to aquire data may well be growing faster. There is therefore a need for another technology that is capable of scaling to multiple data domains and which operationalises the STAC approach of Openshaw(1994). Looking to KDD and the study of technologies for exploring Very Large Data Bases, many have this requirement of applying intelligence to the search techniques. It is felt that in order to attack the problems faced by the exploration of spatial/time/attribute databases without exponentially expanding computational requirements, it is necessary to adopt some of the Artificial Intelligence techniques and find ways of best applying them to the spatial domain.
A number of potentially useful smart search methods exist that could be used to create computationally intelligent geographical analysis tools. Openshaw (1994), Openshaw and Perrée (1996) used genetic algorithms to drive the exploratory search process in two and many dimensional databases. Here attention is focused on the use of flocking and swarm optimistation method to perform a smarter function in the expectation that their methods will subsequently scale to more complex problems.
Flocks provide a valuable exploration/optimisation technique. The principle here is to have a large number of small pattern hunters that move through the GeoCyberspace looking for patterns but then communicate their findings to each other.
Flocks were originally invented as a method for understanding flocking behavior of birds and how to represent it on a computer. The resulting artificial 'birds' are called 'boids', see Raynolds(1987). A number of 'boids' have been created from fairly simple rules to mimic various animal behaviours.
The impetus for this research came partially from Eberhart et al(1996) research into swarms as an optimization technology and partly from an interest in whether the hunter/foraging behaviours modelled in AL could be used to hunt and forage for patterns and clusters in spatial data.
The way rules are set up to govern the behaviour of a flock is in much the same way as many multi-agent systems operate; simple rules are applied to each individual agent (in this case each bird/boid) and it is the interaction of these simple rules that gives rise to the complex and powerful behaviour of the system as a whole.
For a normal, non-pattern hunting flock the rules for each boid are as follows:
Using just the above rules creates a single flock that moves and swoops around together. By adding different coloured boids it is possible to have multiple flocks that move around whilst avoiding each other. The following rules implement this as well as the avoidance of static barriers.
What we wanted to do was develop a species of boid that could be used for exploratory geographic analysis; a geobot!
The following looks at how the basic behaviour of a flock was modified in order to develop a behaviour targeted at performing an efficient search strategy. The rules were developed a stage at a time. After each new rule or modification the behaviour of the boids was observed and a decision made as to whether to keep,drop or modify the rule. This process is on-going and new rules, such as ones controlling boid size, still need to be developed.
As can be seen from the above rules for a simple flock, each boid bases its decision regarding attraction and repulsion to other boids based on their colour. In normal flocks a boid will always remain the same colour and hence part of the same flock for the entire duration of the simulation. For our purposes however, we give the different colours a meaning and change the colour of each boid according to its success at finding localised clusters.
The rules used to govern the smart flock are set out below.
The speed rule may seem slightly counter intuitive at first. The aim in fact is to slow down boids that are exploring interesting areas and hence study them more closely whilst boids that are over uninteresting or empty regions accelerate in order to get out of there.
The result of the colour change is that boids are only attracted towards other boids that are doing well in their pattern hunting whilst actively avoiding boids that are doing poorly. However, regardless of colour a boid will steer away from another boid if it is so close that the search would be duplicated and hence wasted.
In the most severe case, where there is no data inside the boid it dies. The meaning in this context is that it stops moving and becomes a warning beacon discouraging other boids from approaching the area, the net result of this is that after a short time a series of 'dead boids' effectively mark off the empty regions of the map forcing the rest of the flock to concentrate on the areas containing interesting data.
A sample geobot changes colour as it evaluates its position.
Circle represents size and position whilst line indicates direction of motion.
The following diagram summarizes the five boid states currently used.
The rules are only half of the story, most of the rules require some form of parameterization; blue boids for example travel faster, but how much faster should that be? At what point is a result interesting enough to warrant further investigation (red stage) but not interesting enough to mark permanently (yellow stage)?
The second rule states 'Find all detectable (in range) boids...' what is that range? At present these parameters are being set, perhaps somewhat arbitrarily. In the future though, it seems logical to take the ALife paradigm to its logical conclusion and start to breed flock members. Perhaps using genetic algorithm techniques with the most successful pattern hunting boids reproducing to evolve the best parameters.
For the moment though, there is still scope for humans to set the parameters and to watch the differing outcomes, and if necessary to interfere with the search by re-directing attention to a different part of the map.
Flocks are not the only approach under investigation; an earlier approach utilized Genetic Algorithms. They suffered from a number of drawbacks, but advances in visualization techniques as well as a better understanding of how two dimensional space should be presented to GAs means that it may well be time to resurrect the GA approach and perhaps integrate it into the multi search approach outlined above. The basic rule of thumb when building smart systems is hybridisation, so don't put all your eggs in one basket. Indeed, perhaps one of the most interesting lines of further research lies not in the development of new search strategies but in how existing strategies can be made to co-operate in a search. The idea here is to have a centralized 'result surface' shared by multiple search techniques. Each search engine will be able to write results to the surface and, if they employ some form of smart navigation like the flocks, read results back from it to aid navigation.
Using such a mechanism would make it possible to make the best use of the best parts of each technology:
There is no reason why a combination of different search mechanisms cannot be combined to create a hybrid system that communicates and shares experiences at regular intervals. Their systems can be built at different levels of granularity making use of private or public communication networks. CORBA provides one such communication system that would allow search tools written in different languages and running on separate computers to co-operate in this way.
The power of multi agent systems like flocks comes from the high level of emergent behaviour that arises from the complex interaction of the simple rules. The major drawback from this is that it is very hard, if not impossible, to calculate or predict exactly what behaviour will arise from a particular rule or parameter set. Because of this insights and fine tuning of the search process must be done by observation. In order to do this it was necessary to build a visualization system that would allow the researcher to watch the flock move and interact both with each other and the data under analysis.
In addition, as searches through very large data sets could still take a considerable amount of time and it could be very wasteful to lock the user out of the process, using live visualization techniques of the search and the results as they emerge could allow a user to make suggestions to the search as it progresses.
A user may, for example, add barriers or attractors to limit or concentrate the search into particular areas in order to improve the efficiency of the search and to gain extra insights into the regions that the user finds interesting. When searching a very large area a user may zoom into an interesting area and re-start the search at this new finer scale. As a last resort a user may abort a search prematurely if it is obviously failing to find anything at all.
The real-time visualization as it stands carries a heavy computational cost and places a time penalty on the search. However, the kind of visualization required for real world search <-> end user interaction will probably be quite different from the existing search <-> researcher interaction as this is predominantly aimed at providing the researcher with a full representation of the search process including the position and behaviour of every boid at every iteration. An end user however would be more interested only in the most interesting results as and when each one is found and this sort of animation should place far less of a load and hence time penalty on the computational resources available.
As the visualisation engine has been developed to work independently from the search engine it is actually possible to run the two on separate machines, with one dedicating itself to the search whilst others run the visualisation to view and interact with it. This process has been simplified by the fact that the visualisation engine has been written in Java allowing users to view active and archived searches over the internet using a web browser.
If you are reading this paper using a Java 1.1 enabled browser (see table 1) then follow this link to a page containing an example of the swarms searching the city of Baltimore for crime patterns.
This paper has illustrated how flocking algorithms borrowed from AL can be adapted to perform exploratory geographical analysis. The results are very interesting and yield the same results as benchmarks on GAM. The superior search power of AL based Geographical Analysis Methods will soon triumph over computational brute force methods and provide a framework for the subsequent development of other types of smart geographic analysis technology. Well, so we believe and the flocks described here provide a framework that can be developed further.
Chen , M. et al (1996): Data Mining: An Overview from a Database Perspective,IEEE Trans. on knowledge and data engenering, 8(6):866-883.
Eberhart, R. et al(1996):Computational Intelligence PC Tools,AP Professional,USA.
Fayyad, U.M.,et al(1996): Advances in knowledge discovery and data mining. AAAI Press Cambridge, Mass. MIT Press. London.
Kaufman,L. (1990): Finding groups in Data: An introduction to cluster analysis. Jhon Wiley and Sons.
Langton, C (1988): Artificial Life 1Addison Wesley Longman.
Ng ,R.(1994): Efficient and Effective Clustering Method for Spatial Data Mining", Proc. In'l, Conf Very Large Databases, pp144-155, Santiago.
Openshaw, S. (1994): ITwo exploratory space-time-attribute pattern analysers relevant to GIS. In: Fotheringham, S., Rogerson, P. (Eds): Spatial Analysis and GIS. Taylor & Francis.
Openshaw, S. (1995): Developing automated and smart spatial pattern exploration tools for geographical information system applications. The Statistician, 44, No 1, pp 3-16.
Openshaw, S., Perrée, T. (1996): User-centered intelligent spatial analysis of point data. In: Parker, D. (Ed) Innovations in GIS 3, 119-124 ,Taylor and Francis, London
Raper, J.F., Rhind, D., Shephers, J.W. (1992): Postcodes: the new geography. Longman, Harlow
Raynolds, C,W. (1987): Flocks, herds, and schools: A distributed behavioral Model. Computer Graphics,21(4):25-34.