GeoComputation 2000 HomeConference ProgrammeAlphabetical List of AuthorsPaper

Using Genetic Algorithms in Clustering Problems

BAO, F.L. and PAINHO, M.
New University of Lisbon, Portugal

Key words: Cluster Analysis, Genetic Algorithms, Unsupervised Neural Networks, K-Means

Clustering has always been a key task in the process of acquiring knowledge. The complexity and especially the diversity of phenomena have forced man to organise objects based on their similarities. One can say that the objective of cluster analysis is to sort the observations into groups such that the degree of natural association is high among members of the same group and low between members of different groups (Anderberg, 1973). The complexity of such tasks is easily recognized due to the number of possible arrangements. Even a small number of elements (e.g., 25) to be clustered in a small number of groups (e.g., five) produces a very large number of possibilities (2,436,684,974,110,751). Sensitivity to initial points and convergence to local optima are usually among the problems affecting interactive techniques such as K-Means (Bradley and Fayyad, 1998). Emerging technology promises to help achieve better clustering results. Artificial neural networks have been mentioned as a step forward in improving clustering by a number of authors. Openshaw, Blake and Wymer (1995) argue that the use of a neuroclassifier provides a much more flexible and potentially superior means of generating census classifications, a known clustering problem. But there are other methods. In this paper a genetic algorithm (GA) approach to this problem is proposed. The optimisation capabilities of genetic algorithms are well known and commonly used in a variety of scientific fields. Openshaw and Openshaw (1997) note that genetic algorithms are an extremely powerful, widely applicable search technique that provide a global search for problems with many local suboptima. Here the clustering task is tackled using a genetic algorithm which tries to minimize the intra-cluster variability.

The genetic algorithm is tested against the traditional K-Means method and an unsupervised neural network (Kohonens self organising map). First, a particular environment is set up using artificially generated data, then a real world example involving the electoral results in Portugal is used. Another important aspect is related to the possibility of using GA in geographical clustering problems. It seems obvious that the flexibility offered by GA should encourage researchers to develop new and improved ways to tackle clustering tasks in geography.