Visual Exploration of Uncertainty in Remote Sensing Classifications

F.J.M. van der Wel1, L.C. van der Gaag2 and B.G.H. Gorte3
1Utrecht University Faculty of Geographical Sciences - Cartography Section, P.O. Box 80.115, 3508 TC Utrecht, The Netherlands
2Utrecht University Department of Computer Science, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
3International Institute for Aerospace Survey and Earth Sciences (ITC), P.O. Box 6, 7500 AA Enschede, The Netherlands

Introduction
Now that the concern for spatial data quality is becoming widespread (e.g. Guptill & Morrison, 1995) there is a great demand for operational measures to assess and convey that quality. A remote sensing classification contains an often unknown amount of uncertainty that affects the appropriateness of these data for a particular application. During a GIS-overlay procedure, for example, the uncertain position of class boundaries can induce errors. The derivation of statistical quality information and the subsequent presentation of the accompanying uncertainty patterns will be the subject of this paper. It elaborates on some research questions from the CAMOTIUS-project1. This project stresses the potential of remotely sensed data with respect to decision-making purposes. The paper will focus on the CAMOTIUS software.

Statistical quality measures
A maximum a posteriori classification is commonly used to extract land cover information from a remotely sensed data set. As a result of this approach, a vector of posterior probabilities is calculated for each pixel stating for each distinguished class Ci the posterior probability of occurrence given some evidence X:

P(C=Ci\X)

for i = 1 to n (with n the number of classes)

From these vectors, a number of statistical quality measures can be derived, such as the maximum posterior probability or the difference between maximum and second posterior probability. From these values, the reliability of class assignments can be estimated. Uncertain areas can be identified (e.g. transition zones between classes) and - if necessary -subjected to selective further processing.

These measures fail to consider the posterior probability vector as a whole, and as a consequence, are unable to detect the underlying uncertainties of a classification. For this purpose, we propose an additional concept: entropy. Entropy is a measure of uncertainty that is frequently employed in information theory (Kullback, 1959; Shannon, 1948). The entropy of a statistical variable is looked upon as the expected information content of a piece of data that is required to reveal the value of the variable with perfect accuracy. For each pixel, the entropy equals

where P(C=Ci\X) refers to the posterior probability for class Ci. Informally speaking, the entropy of a statistical variable provides an indication of how much the probabilities of the variable's values diverge. The entropy of the variable is minimal if the uncertainty as to its true value has been resolved, that is, if one of its values has been established to certainty (e.g. in case of three classes 1-0-0). The entropy is maximal if none of the variable's values is preferred over the other ones, that is, if the probabilities of the values are uniformly distributed (e.g. 1/3-1/3-l/3 in case of three classes).

One of the main advantages of using entropy as a measure of uncertainty is its ability to summarise all available quality information present in the posterior probability vector by providing one single number per pixel. Therefore, entropy allows for conveying complete statistical quality information.

Visualization
The effectiveness of quality information is highly dependent on the way in which it is presented to a user. Visualization techniques have been recognised as a means to handle large and complex data sets during exploratory analyses (e.g. MacEachren & Taylor, 1994). The exploration of quality information profits from the impact of multiple alternative visualizations, ranging from static to dynamic representations. As an example of the former, bivariate maps can be mentioned, reflecting both thematic class information and entropy. Dynamic visualizations refer to animated data sets, for example showing the changes in entropy over time (in case of time series analysis) or as computed from multiple alternative classifications. CAMOTIUS offers a number of dynamic visualization techniques to communicate the uncertainly information to a user.

Conclusions
The derivation and communication of information about the quality of a data set is an indispensable part of the information process, especially if the data are used in decision-making procedures. Providing users with suitable quality measures such as entropy, and visualization tools to evaluate a particular data set, is of key importance to ensure sound application of that data.

References
Guptill, S.C. and Morrison, J.L. 1995. Elements of spatial data quality. Oxford: Pergamon.

Kullback, S. 1954. Information theory and statistics. New York: John Wiley & Sons.

MacEachren, A.M. and Taylor, D.R. 1994. Visualization in modern cartography. Oxford: Pergamon Press.

Shannon, C. 1948. "The mathematical theory of communication", Bell Systems Technical Journal, 27, 379-423.


1 CAMOTIUS is a research project in which the cartography section of Utrecht University, the ITC, the National Physical Planning Agency (VROM-RPD) and Eurosense b.v. participate. Funding is provided by the Netherlands Remote Sensing Board (BCRS). Main objective is the development of a demonstration package showing the knowledge supported classification of remotely sensed data and the derivation and subsequent visualization of quality information.