Return to GeoComputation 99 Index

Texture information and supervised classification of hyperspectral imagery by means of neural networks

Edward H. Bosch
U.S. Army Topographic Engineering Center, 7701 Telegraph Road, Alexandria, VA 22315-3864 U.S.A.
E-mail: edward.h.bosch@usace.army.mil

Abstract

Although the price of computer memory and its processing speeds rapidly become affordable, the hyperspectral sensors generate vast amounts of data that require efficient hardware and software resources to exploit and analyze such databases. Numerous studies have addressed the problem of classifying hyperspectral data by means of statistical techniques that assume the data has a certain distribution. Non-parametric methods such as neural networks have been used by researchers to classify hyperspectral imagery. In many of these studies the neural network "learns" to separate different classes from each other using only spectral information. In other neural network studies the researchers use both spectral and spatial/texture information to classify the hyperspectral cube. In this work we use the neural network backpropagation algorithm to perform a supervised classification of single-band spatial data obtained from a hyperspectral cube. The analysis of the spatial data will be done by training various networks on data extracted from windows (kernels) of several sizes from the original hyperspectral cube. The networks are to discern six classes. We will show that in some cases when classifying hyperspectral imagery with neural networks, the use of spatial data has its advantages over the use of spectral data. This becomes evident when it is imperative for the user to include in the classification scheme the most pertinent bands of the classes in question of the hyperspectral cube. This information may not be available and so the user may decide to include most or all of the bands in the neural network's training data set. This generates bigger architectures that require more processing time and iterations for convergence. Varying the size of the window of the training data will allow us to analyze the effect this has on the boundary of two or more different classes.

1. Introduction

The backpropagation algorithm is a simple non-parametric steepest descent method for which no a priori knowledge of the distribution of the data is needed. This method has been used extensively to classify multispectral and hyperspectral data using spectral and spatial (texture, GIS layers) information. In their work, Skidmore et al. (1997) used both spectral and spatial information. Furthermore, windows of several sizes were used to compute measures of texture (variance and skew). These measures of texture were used as input nodes to the neural network's input layer, along with the multispectral bands and the GIS layers of information (elevation, slope, aspect, topographic position, geology, rainfall, and temperature). Bruzzone et al. (1997), like Skidmore, also used spectral information, measures of texture and ancillary data as input nodes to the neural network. Paola and Schowengerdt (1997) as well as Kamata and Kawaguchi (1993) used a 3 by 3 window of spatial data for each of the spectral bands of their multispectral data sets. These methods merged spatial and spectral data to classify the multispectral data sets. Furthermore, the spatial information did not consist of measures of texture but of the texture itself. Foschi and Smith (1997) used simulated multispectral data and other image derived features. Rand (1995), Bosch (1995, 1997) and Bosch and Shine (1996) only used hyperspectral signatures in their analyses. Other more sophisticated neural network methods like Raghu et al. (1995) and Goltsev (1996) have been used for texture classification and texture segmentation, respectively. Although these methods may be applied to multispectral data, they were not tested on such data sets.

In this experiment, the author proposes to train the backpropagation algorithm on spatial (not spectral) data extracted from a hyperspectral data set. The spatial data will be derived from windows of several sizes and as a consequence, the neural architectures will be different. Training the backpropagation algorithm, using hyperspectral and spatial data, tends to produce large networks (with many variables) that may need many iterations for convergence. The use of spatial data will provide us with a means of generating smaller architectures, which has a positive effect on the rate of convergence of the backpropagation algorithm. We also will analyze how varying the size of the window of the training data affects the classification results.

1.1 Multilayer Feedforward Backpropagation Algorithm

The multilayer feedforward backpropagation algorithm is a steepest descent numerical method whereby an error function is minimized. In this study there are two stages associated with this method: training and classification. In the training stage, the network learns to recognize a subset of the whole data set being analyzed. In this context, learning implies obtaining a set of parameters that make up the network itself. This is the backpropagation algorithm. Once the training has been performed and the network obtained, a test data set and/or the rest of the data is validated/classified by the network.

In its simplest form the network is organized into two layers, an input and an output layer. The input layer is a vector made up of that information being modeled by the neural network. The output layer (another vector) is intimately related to an ideal desired output. The desired output is arbitrarily chosen by the user. To obtain the output from the input you need to apply a set of weights to the input. The essence of the backpropagation algorithm lies in obtaining these weights. This corresponds to the training portion of the algorithm. The error function corresponds to the sum of the difference squared between the components of the desired output and the output produced by the network. That is, the Euclidean distance between the desired output and the computed output. Since the output layer is a function of the weights, we have that the error also is a function of the weights; therefore, we compute a set of weights that minimize the error between the output and the desired output. This is done by iterating or moving in the direction of the negative gradient (steepest descent) of the error. In mathematics notation we have

Input Layer

Output Layer

Error Function

x0

x1 = G[W x0]

E(W) = || d - x1 ||2,

where d is the desired output, W (a weight matrix) corresponds to the set weights that minimize the error E(W) and G(x) = 2/(1 + exp(-x)) - 1, is the bimodal sigmoidal function applied componentwise to each of the elements of the vector (W x0). To obtain these weights we compute the gradient of the Error function, -(E(W)), with respect to the individual elements of the weight matrix W, and iterate the following system

Wn+1 = Wn - c *Grad[E(W)].

The gradient, Grad[E(W)], guides the system in the direction that minimizes the E(W) while the learning parameter c determines the rate at which the gradient should be modified.

Choosing the proper learning parameter has been the subject of many studies. In the literature, many methods have been discussed to accelerate the convergence rate of the algorithm. Zurada (1992) as well as others discuss the Momentum Method. Although these methods do not guarantee a global minimum, Barhen et al (1994) has designed an algorithm that does "overcome local minimum." These methods and others produced encourage results when applied to standard benchmarks. It would be beneficial to apply these methods to hyperspectral data and note how much more information is obtained from a network from which a global minimum has been obtained vs. a network from which only a local minimum was obtained. Nevertheless, in general, researchers are satisfied with the networks produced by the backpropagation algorithm and its variations. That is, researchers do not expect from the algorithm the set of weights W minimizing the error function E(W).

Once the weights W have been obtained by the method described above, we apply these to a test data set and/or the rest of the data for validation. This corresponds to the classification stage, which in general is not time consuming since the weights already have been obtained. For more information on layered networks see Werbos (1974), Rumelhart and McClelland (1986), Zurada (1992), Haykin (1994).

1.2 Hyperspectral data

The hyperspectral cube consists of 210 bands with a spectral range varying from 400 nm to 2500 nm, was calibrated to reflectance and recorded at 10,000 feet with each pixel with a ground resolution of 1.5 m. The scene contains 604 rows by 187 columns and each pixel is 16 bits. The actual data used in the training and classification stages of the work was remapped from 16 bits to 8 bits. In the experiments we used bands 30 ((516.195 nm), 31 ((521.503 nm), 32 ((526.937 nm) and 33 ((532.502 nm).

1.3 Training data sets

The scene in question is not very complex and is mainly composed of trees, grass and bare soil. The training data sets were all derived from band 30 (roughly 516.19 nm) of the hyperspectral cube. Also, the training sets were all made up of the following six classes:

Class No.

Class Description

1

Deciduous trees

2

Exposed dark soil

3

Exposed bright soil

4

Grass covered soil

5

Tilled soil with vegetation

6

Road

See Figure 1 for the class colors and labels. Landgrebe (1997) says, "The most logical way of quantitatively defining the classes of interest in a given analysis situation is via design or training samples drawn directly from the data set to be analyzed." This also reduces the problem associated with calibration issues. Although the data used in these experiments was calibrated to reflectance units, in some of our previous experiments we have effectively used uncalibrated data to generate training data sets, and later, classify the remaining data set.

2. Generating the networks

In this study, four different networks were obtained by training them on four spatial data sets, all derived from band 30 (roughly 516.19 nm) of the hyperspectral data cube. We describe them as follows:

Training-set

Pixel Window Size

No. of Samples

NN Architecture

1

5x5

1130

26-14-6

2

7x7

1130

50-14-6

3

9x9

1130

82-10-6

4

11x11

2672

122-14-6

Training-set number 1 is obtained by generating samples from a 5x5 window from band 30. Each sample generated from a 5x5 window is composed of spatial data associated with one of six classes. These will be described later. Since the window is of size 5x5, there are a total of 25 components (plus 1 more accounting for the augmented case) comprising the input layer of the neural network architecture. That is, the input vector for this network has 26 components, which is defined by taking the 25 samples of the 5x5 window and storing them in a vector of 26 components. The additional component is included for computational purposes.

In these experiments, in addition to having an input layer and an output layer, we also have a hidden layer. Additional layers are necessary for classes whose decision surfaces are difficult to separate. Choosing the number and the size of hidden layers is a matter that is still highly debated. In this case the vector in the hidden layer has 14 components and the vector in the output layer has 6 components. We kept the number of components in the hidden layer fairly small so that the total number of variables of the network was relatively small in relation to the number of training samples. As the number of dimensions in the network increases, the number of variables in the network will also increase. A high number of dimensions in all of the layers, in particular, the input space, requires a large amount of training samples so that the model may characterize well the data; however, in some cases, when using neural networks to characterize high-dimensional input spaces like hyperspectral data, the network generated ends up having more variables than the number of available samples. This is undesirable for there are an infinite number of solutions that may be obtained, in addition to the ones obtained by the algorithm from local minimum. Nevertheless, in some studies the above has been done and the networks obtained are able to generalize well enough the remaining samples of the data sets in question. This situation can be avoided by generating a different architecture for the network.

For the output layer, note that the number of its components and the number of classes are both 6. This is so because of the following relation:

Class

Desired output vector

1

d1= [1 0 0 0 0 0]'

2

d2= [0 1 0 0 0 0]'

3

d3= [0 0 1 0 0 0]'

4

d4= [0 0 0 1 0 0]'

5

d5= [0 0 0 0 1 0]'

6

d6= [0 0 0 0 0 1]'

That is, we chose the samples of class 1 to be associated with the desired output vector

d1= [1 0 0 0 0 0] ', and, so on.

Note that as x increases without bound in both the positive and negative directions, the sigmoidal function G(x) goes to +1 and -1 respectively. These sigmoidal function values can only be achieved in the limit. Nevertheless, we have chosen one of the extremes of the sigmoidal function (+1), instead of a value less than one, to be one of the components of the corresponding desired output vectors described above. These extreme values will never be achieved by the algorithm, thus forcing it to perform more iterations. An unnecessary number of iterations tends to make the algorithm "learn" the training data set very well but with a limited capacity to generalize. Nevertheless, imposing such values does generate a better network as reported by Skidmore et al. (1997). We show an example of this below.

Once the training has been done, class assignment is performed by computing the following:

max(x2(i)), i=1,6

for the training data set and

min(|| x2 - di ||2), i=1,6

for the images,

where || (*) || is the Euclidean distance between the computed output and the desired output, and x2 is the network's output layer. Recall that the networks in these experiments have an input, hidden and output layers. The maximum of the components of x2 has to be greater than or equal to a threshold value and the min(||x2 - di ||2) has to be less than or equal to some threshold. Using one of the samples of class 4 of the training data set, we show the difference between a network that was produced by 20,000 iterations, and one produced by 100,000 iterations. The desired output vector associated with a sample belonging to class 4 has been defined as

d4= [0 0 0 1 0 0]'

.

It should be the case that samples belonging to this class should have the maximum of the components of the output vector at index 4 while the remaining components should be as close to 0 as possible. At iteration number 20,000, for a sample in class 4 of the training set derived from the 9x9 windows, the network produced this output:

x2=[-0.201 -0.008 -0.022 +0.901 +0.574 +0.023]'.

Note that the maximum occurred at index number 4 (+0.901) but also note that index number 5 (+0.574) is not as close to 0 as it should be. This will account for a large error and will decrease the network's capacity to generalize. In contrast to this, at iteration number 100,000 using the same sample the network produced the following output:

x2=[-0.014 +0.013 -0.010 +0.971 +0.049 +0.007]'.

Again, the maximum occurred at index number 4 (+0.971) but in contrast to the previous result, index number 5 (+0.049) is closer to 0. The increase in number of iterations did not improve the outcome of index number 4 as much as it did that of index number 5. When we obtain a network whose results are not like the ones above, its capacity to generalize is greatly diminished; therefore, to increase to network's accuracy it may be necessary to increase the number of iterations. In our experiments we have not had problems with over training (an excess of iterations) the network.

The remaining training sets were obtained in a manner similar to the one described above for the training set derived from the 5 X 5 window. We attempted, with no success, to obtain a neural network for a training set that was generated by extracting data from a 3 X 3 window from band 30. Two possible reasons may have accounted for this problem. First, given that there are only 9 components in the input layer, 1 hidden layer, and the number of classes to discern is 6, this particular network may have been over its capacity to properly learn these 6 classes. Contrary to this case, in a separate experiment, when we decreased the number of classes from 6 to 3, the algorithm was able to converge and provide us with a network. Second, the samples comprising the training data set may not have had enough useful spatial information for the algorithm to converge. Furthermore, some of these classes were similar to each other. Nevertheless, the algorithm converged for samples extracted from larger windows.

2.1 Accuracy of networks

Using band 30, we generated four networks by training each one on data sets derived from windows of sizes 5 X 5, 7 X 7, 9 X 9, and 11 X 11, respectively. We also generated data sets, from bands 31 through 33, which were spatially correlated to band 30's training data set. This was done to assess the effectiveness of each of the networks capacity to generalize spatial information. Furthermore, a given network may be responsive to a particular sample throughout several bands of the hyperspectral data cube, thus providing a mechanism to validate the sample's class membership. Training a network with spatial information (and/or spectral) may be significant if the network is to be applied to multitemporal data sets to assess the ground cover changes over a period of time.

Before we proceed to discuss the results, below we show the correlation coefficients among bands 30 - 33 and their corresponding mean values.

Correlation Coefficient for Bands 30 - 33

Band

30

31

32

33

30

1.0000

0.9983

0.9956

0.9910

31

0.9983

1.0000

0.9986

0.9956

32

0.9956

0.9986

1.0000

0.9985

33

0.9910

0.9956

0.9985

1.0000

Mean Values for Bands 30 - 33

Band

30

31

32

33

Mean

26.7606

29.3317

31.7875

33.7443

We can see that the bands are highly correlated (as expected) and that the mean values of the bands are slightly increasing, giving an indication that the overall brightness is changing. As a consequence, the magnitude of the samples of the spatial data in band 30 is not the same as the magnitude of the corresponding samples of the spatial data in the remaining bands. To overcome this slight increase in gray shade magnitude, we multiply bands 31 - 33 by 26.7606/29.3317, 26.7606/31.7875 and 26.7606/33.7443 respectively, and then compute the accuracy of the network. We show the accuracy of the network obtained by training spatial samples of window size 5 X 5 from band 30. We also show the accuracy of this network when applied to bands 31 - 33.

Window Size 5x5

Spectral Band

Accuracy

30

99.65%

31

98.50%

32

94.60%

33

88.05%

The accuracy of the above network (5x5 window data set) when applied to its training data set is 99.65%. Ideally we would like to obtain a 100% accuracy rate. Nevertheless, some of the samples confused were members of similar classes. From above we can see that the network still has the capacity to recognize the spatial structure in other bands as long as the data sets are modified by the proper factors. Had the bands not been modified by the corresponding factors, the results would have been drastically different. Note that although these networks are designed to discriminate spatial (one single band at a time) information, we can always generate a network that will discriminate spectral information as well and combine both classification results. Using the results of several classifiers may provide a better mechanism to determine a sample's class membership than using only one classifier.

In a previous study, the author (1997) used spectral information to classify multitemporal hyperspectral data sets. The results were not unreasonable given that the hyperspectral data sets were obtained over the same area but at different dates and altitudes. The fact that different altitudes were involved in the tests made it more difficult for the network to separate the classes. Although in this current study we have not used the networks to classify spatial multitemporal data sets, the sensor did provide pixels with better spatial (and spectral) resolution than, for example, Landsat imagery. In our case, the pixels defining a boundary are better resolved. As we will show, depending on the size of the window used to generate a network, this boundary is just one pixel wide. Landsat's spatial resolution is too coarse if refined analyses are to be performed.

Now we show the accuracy of the networks that were trained on spatial samples of window sizes 7 X 7, 9 X 9 and 11 X 11 from the band. We also show the accuracy of these networks when applied to bands 31 - 33.

Window Size 7x7

Spectral Band

Accuracy

30

100.00%

31

99.99%

32

98.05%

33

93.72%

Window Size 9x9

Spectral Band

Accuracy

30

100.00%

31

100.00%

32

99.47%

33

97.35%

Window Size 11x11

Spectral Band

Accuracy

30

99.81%

31

99.40%

32

98.88%

33

95.96%

Note that the networks trained on the spatial samples extracted from band 30 from 7 X 7 and 9 X 9 windows correctly classified (100%) their corresponding training data sets. This is to be expected. The network trained on samples extracted from windows of size 11x11 from band 30, classified its training set with an accuracy of 99.81%. We noted that in all cases, the samples belonging to the class labeled road were all classified properly. This may have been due to the fact that the road in these bands appeared very bright in comparison to the rest of the image; therefore, this left little room for the network to provide incorrect results.

As we can see, varying the size of the window does have an effect on the classification rate of each network. The size of the object in question dictates the size of the window. For the data considered in this experiment and the resolution of the image, the sizes of these windows seem to be appropriate. Nevertheless, instead of classifying the data with one network, the use of several networks, obtained by training on samples extracted from windows of different sizes, provides a better way to validate the accuracy of the point that is being classified. We will discuss this further on.

The above results indicate that the best results were obtained by the network trained on samples extracted from the 9x9 windows from band 30. This network also was applied to the data sets corresponding to bands 40, 56, 94 and 121. The results are shown below.

Window Size 9x9

Spectral Band

Accuracy

40

99.38%

56

60.62%

94

24.25%

121

62.21%

The reason for the differences in accuracy has to do with each bands' nonlinear variation from band 30's mean gray shade value. Recall that we rescaled the data sets obtained from each band prior to applying the network. For contiguous bands, the scaling factor effect is not as noticeable as for bands away from each other. Great care has to be taken when applying this method to multitemporal data sets. That is, the corresponding data sets have to be scaled properly.

Although changing the order of the pixels within each window does not change the statistics, the physical location of each pixel within the window is very important for the network to be trained properly. In a separate experiment, we took the training samples and changed at random the order of the pixels within each window (5x5 windows), effectively generating new samples, and the algorithm did not converge with this training data set. The reason for this lack of convergence was due to the change in structure of the data being modeled. This may not be too obvious when small windows are being considered and, in particular, the data within these windows consists of tree canopy. That is, one may think that the effect of reordering the location of the pixels in a small window, consisting of tree canopy, will not be that extreme. Nevertheless, the reordering of these pixels within the windows takes away from the network that information being modeled.

2.2 Classification map of the four networks applied to Band 30

In this experiment we took the four networks obtained by training them on samples of window sizes 5x5, 7x7, 9x9 and 11x11, extracted from band 30, and used them to classify band 30. Class membership of each 9x9 window of band 30 was inferred by computing the min(|| x2 - di ||2), i=1,6. We also imposed for this value to be less than the threshold value 0.3. Having done this, all networks agreed on 58.25% of the samples. Imposing a smaller threshold value would have made the networks disagree on the results, but the classification map would have been more reliable. Furthermore, using four networks/methods (as opposed to one) to obtain a class map, increases the probabilities of correctly labeling a sample. All networks disagreed only on 0.00490% of the samples. In Figure 1 we show a portion of the area in question. The classification map produced by this experiment is shown in Figure 2. Any pixel with color (including black) indicates where all methods agreed. Black indicates that the network could not infer class membership for that sample. Note that there are very few unclassified areas where the tree canopy is located. This is due to the fact that the wooded area is fairly homogeneous; therefore, allowing the networks to label these samples without much confusion. Notice that the networks did not agree on labeling the roads running from left to right since the roads are too narrow in comparison to the surrounding area. These windows did not provide enough road information to the networks to infer class membership. Nevertheless, some of the networks (the smaller ones) did correctly classify these roads. Observe that the network trained on the samples extracted from the 11x11 windows produced a smoother class map than the other networks. The more information the network's windows have about a homogeneous area and its surroundings, the smoother the continuity in the class map becomes. In contrast to this case, we observed in large windows, which consisted of several classes, that some of the networks missed parts of the roads and also mislabeled bright areas as such. Networks consisting of smaller windows are not necessarily the answer to this problem since as we saw, some of these networks cannot "learn" to recognize these classes for there is not enough information on the windows. Perhaps a possible solution may be to set very stringent error thresholds, forcing many points to be unlabeled, and then using several networks and/or classification methods to produce different class maps, ultimately, producing a class map in which each pixel has been labeled the same by each classification method or network. Pixel demixing methods, using spectral or spatial information, may be used to label those samples that remained unlabeled.

2.3 Classification map of the 9x9 window network applied to Bands 30 - 33

In this experiment we took the network trained on the spatial samples extracted from band 30 from 9x9 windows and applied it to bands 30 - 33. As opposed to the previous experiment, we used one network to classify bands 30 - 33. We noted that 68.35% of the samples were all labeled the same. On the other hand, the network never produced a sample with four different labels. This method may seem less reliable than that of the previous experiment since only one network is being used to classify several bands. Nevertheless, as the gap between bands increases and their correlation coefficient drops, the network may be able to yield certain information that otherwise would be unnoticed. This occurred at the boundary of the forest and the road, as the forest casted its shadow on the ground. For bands 30 -33, the network could not classify these samples; therefore, they were labeled as unknown and appeared in the form of a thin line almost one pixel wide (see Figure 3). The response (output vector) of some of these samples of band 30 were similar to the one that follows:

x2=[0.794 0.016 -0.067 0.836 0.008 0.029].

Note that index number 1 (deciduous trees) and index number 4 (grass covered soil) have the highest magnitudes. The sample in question more likely belongs to the class trees, but the highest response comes from an unlikely choice, grass covered soil. Even though we have not made a decision about the nature of this sample, the fact that it was labeled unknown in bands 30 - 33 has helped us in identifying the boundary between the trees and the road.

Figure 1

We show in Figure 1 a portion of the image being analyzed, the numerical labels and the corresponding colors associated with each of the 6 training classes that are:

Class No.

Class Description

1

Deciduous trees

2

Exposed dark soil

3

Exposed bright soil

4

Grass covered soil

5

Tilled soil with vegetation

6

Road

Figure 2

Figure 2 corresponds to the results obtained from training four networks on data sets derived from band 30 from windows of sizes 5 X 5, 7 X 7, 9 X 9, and 11 X 11, respectively. These 4 networks were applied to band 30, and as a consequence, four class maps were obtained; i.e., four networks applied to one band. Any pixel with a color other than black shows where all class maps agreed on the same label. A pixel whose color is black shows where at least one of the class maps disagreed.

Figure 3

Figure 3 corresponds to the results obtained from training a network on a data set extracted from a 9x9 window from band 30, and then applying such network to bands 30 - 33. That is, one network applied to four bands. The labeling scheme is as in Figure 2.

3. Conclusions

We have used neural networks to classify spatial information of single band hyperspectral data. The accuracy of these networks was assessed by applying them to spatially correlated samples obtained from several bands other than the one used to train the networks. In one of the experiments, four networks were used to classify band 30 of the hyperspectral data set, thus producing a class map of the image that associates each pixel on the ground with a label. The possibilities of generating a more accurate class map are increased by using several networks. Other experiments may include the use of several classification methods. In the second experiment we classified bands 30 -31 with one of the networks obtained and produced a class map of the hyperspectral image. Then we determined which samples in the four bands were labeled the same and which samples in the four bands were labeled different. This provided us with information of unlabeled samples that otherwise would have gone unnoticed.

References

A. K. Skidmore, B. J. Turner, W. Brinkhof and E. Knowles, 1997. "Performance of a Neural Network: Mapping Forests Using GIS and Remotely Sensed Data," Photogrammetric Engineering and Remote Sensing, Vol. 63, No. 5, pp. 501-514.

L. Bruzzone, C. Conese, F. Maselli and F. Roli, 1997. "Multisource Classification of Complex Rural Areas by Statistical and Neural-Network Approaches," Photogrammetric Engineering and Remote Sensing, Vol. 63, No. 5, pp. 523-533.

J.D. Paola and R.A. Schowengerdt, 1997. "The Effect of Neural-Network Structure on a Multispectral Land-Use/Land-Cover Classification," Photogrammetric Engineering and Remote Sensing, Vol. 63, No. 5, pp. 535-544.

P.G. Foschi and D.K. Smith, 1997. "Detecting Subpixel Woody Vegetation in Digital Imagery Using Two Artificial Intelligence Approaches," Photogrammetric Engineering and Remote Sensing, Vol. 63, No. 5, pp. 493-500.

E.H. Bosch, 1997. "Classifying Multitemporal Hyperspectral Imagery Utilizing Neural Networks," Proceedings of the International Symposium on Spectral Sensing Research, December 13-19, 1997, San Diego, CA.

D. Landgrebe, 1997. "On Information Extraction Methods for Hyperspectral Data," Proceedings of the International Symposium on Spectral Sensing Research, San Diego, CA.

E.H. Bosch and J.A. Shine, 1996. "Evolutionary Experiments to Optimize Hidden Layers for Hyperspectral Imagery Classification," Proceedings of the World Congress on Neural Networks, San Diego, CA.

R S. Rand, "Exploitation of Hyperspectral Data Using Discriminants and Constrained Linear Subpixel Demixing to Perform Automated Material Identification," Proceedings of the International Symposium on Spectral Sensing Research, November 26 - December 1, 1995, Melbourne, Australia.

A. Goltsev, 1996. "An Assembly Neural Network for texture Segmentation," Neural Networks, Vol. 9, No. 4, pp. 643-653.

P.P. Raghu, R. Poongodi and B. Yegnanarayana, 1995. "A Combined Neural Network Approach for Texture Classification," Neural Networks, Vol. 8, No. 6, pp. 975-987.

E.H. Bosch, "The Effects Different Neural Network Architectures Have on the Exploitation of Hyperspectral Data," Proceedings of the International Symposium on Spectral Sensing Research, 26 November - 1 December 1995, Melbourne, Australia.

J. Barhen, N. Toomarin and A. Fijany, "Learning Without Local Minima," IEEE World Congress on Computational Intelligence, June 27-29, 1994, Orlando, FL.

S. Haykin, 1994. "Neural Networks, A Comprehensive Approach," Macmillan College Publishing Company, Inc., Englewood Cliffs, NJ.

S. Kamata and E. Kawaguchi, 1993. "A Neural Net Classifier for Multi-Temporal Landsat Images Using Spatial and Spectra Information," IEEE Proceedings of 1993 International Joint Conference on Neural Networks, pp. 2,199-2,202.

J. Zurada, 1992. Introduction To Artificial Neural Systems, West Publishing Company, St. Paul, MN.

D.E. Rumelhart and J.L. McClelland, 1986. "Parallel Distributed Processing," MIT Press, Cambridge, MA, Vols. 1-2.

P.J. Werbos, 1974. "Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences," Doctoral Dissertation, Appl. Math., Harvard University, MA.