Spatial Analysis of Geochemical Data Using Fuzzy Clustering and Non-Linear Mapping

Zhiqiang Feng
Department of Geography, Lancaster University, Lancaster LA1 4YB, United Kingdom

Research on spatial patterns in geochemical data is an important theme with both theoretical and applied value. The analysis is relatively complicated because of the comprehensive interaction between diverse environmental factors such as geology, hydrology, landforms, and human activities. To make full use of these data by using multivariate statistical methods to model and estimate spatial pattern needs further research.

Approaches based on fuzzy set theory came out only in the mid-1970s when the theory was accepted fairly widely and algorithms became available. There are two kinds of fuzzy classification: the semantic import model (SI) and fuzzy k-means clustering (FKM) (Gaans, 1993). The employment of the SI model depends on the existence of a well defined and functional classification scheme. Conversely, the FKM approach is an unsupervised classification approach and does not need prior knowledge. Generally speaking, FKM, also known as FCM, is better developed and widely used in diverse disciplines.

The application of a continuous or fuzzy classification for geographical phenomena arose for the following reasons:

  1. Many geographical phenomena vary gradually over space and classification boundaries cannot be simply defined as crisp lines.
  2. Even within classification units, variation still exists.
  3. The simplification inherent in conventional classification methods results in considerable information loss.
  4. Measurement errors also justify the application of fuzzy methods.
  5. For some practical tasks, requests may be expressed in natural language which could be vague, necessitating an overlapping classification to meeting the demand of users.
Fuzzy cluster analysis can be viewed as an extension of traditional cluster analysis. In crisp clustering a sample must be forced into one of several clusters according to some rule or algorithm. However, in fuzzy clustering a sample with multiple attributes could belong to different groups at the same time by assigning memberships to different groups. This also means fuzzy clustering can reduce the distortion produced by the outliers or intermediate points to a minimum. And thus the classification results cope with the data structure more closely. Non-linear mapping (NLM) is a method of transforming data in different dimensional spaces. It attempts to minimise the distortion between inter-point distances. The algorithm was first presented by Sammon (1969). The research is undertaken on a geochemical data set from Vancouver Island, Canada. The original purpose of the geochemical survey was mineral resource exploration. The primary data are 1139 points with 20 variables. In the analysis we choose 13 variables according to summary statistical results. First, we perform a logarithmic transformation on the data. After comparing the two methods of box-plot and NLM for identifying outliers we employ the second method to pick out outliers on the basis of subdividing the data by bedrock unit. FKM is then used to classify the whole data set into groups. The detailed usage of FKM is also discussed. The results are displayed in attribute space and in geographical space separately by NLM and map. According to the membership values a result of "hardened" groups is produced.

The results show that a combination of NLM and FKM provides a set of powerful tools to analyse the spatial pattern. The spatial variation of geochemical parameters primarily is influenced by lithology in the region.

Fuzzy concepts suit a reality which shows gradual variation both in the multivariate and geographical space. The two alternative methods, NLM and FKM, seem to be a natural couple for the analysis of fuzzy classification, and should be used more widely in geographical information handling and environmental studies.

Fuzzy approaches have drawn wide attention from geographers (Burrough, 1990). It seems necessary to integrate some mature fuzzy approach into GIS. Odeh et al. (1992) suggest that if the FKM method be integrated into GIS considerable benefits could be accrued. We think a loose coupling is a suitable way to accept new spatial analysis methods. In our research we use a loose interface between GIS and FKM and NLM algorithms by moving data files between them. In addition, there is a need for users to visualize the intermediate results and make judgements about the model and parameters which are needed to perform the analysis in the next step. This means the whole process has to be an interactive one. The results of FKM and NLM should be well displayed and represented in graphics and in text with respect to the need of users.

The author would like to thank Graeme Bonham-Carter of the Geological Survey of Canada for generously providing the geochemical data for northern Vancouver Island. Special thanks are due to Dr. Anthony C. Gatrell and Dr. Robin Flowerdew who gave me supervision through the research.