![]() |
Stephen Harding
Center for Intelligent Information Retrieval, Department of Computer
Science, University of Massachusetts, Amherst, MA 01003-4610
E-mail: harding@ciirsrv.cs.umass.edu
However, we believe that the cumulative body of research as expressed in the abstracts of the papers, posters, and keynote addresses from the five GeoComputation conferences may best characterize GeoComputation, not the work or definition of any one individual. Consequently, this paper does not attempt to define GeoComputation per se, but explores the scope or nature of GeoComputation by examining the body of research presented at the five conferences between 1996 and 2000 at the University of Leeds, UK, the University of Otago, NZ, the University of Bristol, UK, and Mary Washington College, USA, as well as most abstracts submitted for the conference at the University of Greenwich, UK. In other words, this is a bottom-up approach: we look at GeoComputation in terms of what GeoComputation researchers say they do.
Text analysis software developed by the Center for Intelligent Information Retrieval at the University of Massachusetts was used for the analysis. Word and phrase frequencies in the abstracts for each conference were analyzed separately and then compared. The results provide insight into GeoComputation by describing the range of research topics, core technologies, and concepts encompassed by GeoComputation. General trends and patterns are identified and defined in a semi-quantitative manner.
In the Epilogue of Geocomputation, A Primer, the same book in which Couclelis' paper appears, Macmillan (1998) essentially takes issue with Couclelis' position. He believes that GeoComputation includes the latest forms of computational geography and that it is not an incremental development. He accepts that sound theory is needed, but believes that it has to some extent already been provided by Openshaw, at least as a form of inductivism. Macmillan's "definition" of GeoComputation is much broader than that suggested by Couclelis; he believes GeoComputation ". . . is concerned with the science of geography in a computationally sophisticated environment." (p. 258).
Gahegan (1999), like Couclelis, sees the concern of GeoComputation as ". . . to enrich geography with a toolbox of methods to model and analyze a range of highly complex, often non-deterministic problems." (p 204). But he views GeoComputation as an enabling technology, one needed to fill the ". . . gap in knowledge between the abstract functioning of these tools . . . and their successful deployment to the complex applications and data sets that are commonplace in geography." (p. 206). He also lists a series of challenges that GeoComputation must overcome, but these problems involve the application of the sophisticated tools available to GeoComputation researchers, as well as the complex problems associated with handling large, unwieldy data sets. Gahegan's is a practical approach to GeoComputation, but one with promise and vision, different from Couclelis's philosophical, possibly pessimistic, perspective.
Now to the work of Stan Openshaw, who, if anyone can be so-called, is the father of GeoComputation. In the Preface to GeoComputation, Openshaw and Abrahart (2000) define GeoComputation as a fun, new word. They see GeoComputation as a follow-on revolution to GIS; once the GIS databases are set up and expanded, GeoComputation takes over. They state that "GeoComputation is about using the various different types of geo-data and about developing relevant geo-tools within the overall context of a 'scientific' approach." (p. ix); it is about solving all types of problems, converting computer "toys" into useful tools that can be usefully applied. And it is about using existing tools to accomplish this and finding new uses for existing tools. They also link GeoComputation to high performance computing. As both Couclelis and Gahegan did, Openshaw and Abrahart list a series of challenges for GeoComputation, but challenges inherent to GeoComputation, not challenges that GeoComputation must overcome to survive.
Openshaw (2000) states that GeoComputation ". . . can be regarded . . . as the application of a computational science paradigm to study a wide range of problems in geographical and earth systems . . . contexts." (p. 3). He identifies three aspects that make GeoComputation special. The first is emphasis on "geo" subjects, i.e., GeoComputation is concerned with geographical or spatial information. Second, the intensity of the computation required is distinctive. It allows new or better solutions to be found for existing problems, and also lets us solve problems heretofore insoluble. Finally, GeoComputation requires a unique mind set, because it is based on ". . . substituting vast amounts of computation as a substitute for missing knowledge or theory and even to augment intelligence." (p. 5). Openshaw clearly sees GeoComputation as dependent upon high performance computing, as suggested above. He sees the challenge for GeoComputation as developing the new ideas, methods, and paradigms needed to use increasing computer speeds to do useful science in a variety of geo contexts.
Openshaw (2000) also looks at definitions of GeoComputation presented by Couclelis (1998), Longley (1998), and Macmillan (1998). He disagrees with Couclelis: GeoComputation is not just using computational techniques to solve spatial problems, it is a major paradigm shift affecting how the computing is applied. Openshaw sees GeoComputation as a much bigger thing, i.e., ". . . the use of computation as a front-line problem-solving paradigm which offers a new perspective and a new paradigm for applying science in a geographical context." (p. 8). Openshaw basically agrees with Macmillan's description of the nature of GeoComputation, but he would probably disagree with the scope: he views GeoComputation as something much larger and broader. GeoComputation relies on the potential of applying of high performance computing to solve currently unsolvable or even unknown problems. It awaits the involvement of appropriately innovative, forward-thinking 'geocomputationalists' to achieve that potential.
Openshaw (2000) also notes that not all researchers agree with his definition of GeoComputation. He believes that this may be because other definitions, such as that of Couclelis, focused on the contents of presentations made at the previous GeoComputation conferences. He believes definition should be developed in a more abstract, top-down manner.
These top-down definitions rely on each author's perspective, based on their personal backgrounds in geography, to place GeoComputation in the overall context of geographic and computational research. Longley (1998), however, states that at this point we must assume that GeoComputation is ". . . what its researchers and practitioners do, nothing more, nothing less . . ." (p. 9). Regardless of Openshaw's (2000) comments about this type of definition, we agree with Longley's statement and therefore use the cumulative body of research as expressed in the abstracts of the papers, posters, and keynote addresses from the GeoComputation conferences to characterize GeoComputation. We do not attempt to define GeoComputation, but explore its scope or nature by examining the body of research presented at the five conferences held at the University of Leeds, UK, in 1996; at the University of Otago, Dunedin, NZ, in 1997; at the University of Bristol, UK, in 1998; at Mary Washington College, Fredericksburg, USA, in 1999; and at the University of Greenwich, Chatham, UK, in 2000. We attempt to determine "what's in" and "what's out," as well as evaluating more subtle changes in research emphasis over time. In other words, this is a bottom-up approach: we look at GeoComputation in terms of what GeoComputation researchers say they do (in the context of acceptable material as determined by the individual conference organizers) and use this information to elucidate future trends. What makes this approach different is that we analyze the abstracts of the five conferences in a semi-quantitative way.
The abstracts from the Leeds, Bristol, and Fredericksburg conferences were transferred from the GeoComp 99 CD-ROM proceedings (Diaz, et al., 1999) to a word processor. Abstracts for the Bristol keynote lectures and those from the Dunedin conference were entered manually from Longley, et al. (1998) and Pascoe (1997), respectively. Those from the Chatham conference were downloaded from a web site set up by the conference organizers for review by the International Steering Committee for GeoComputation or received directly from the conference organizers by email. One file was made for each conference. These files were then edited to remove all parentheses, numbers, equations, special characters, bolding, underlining, and italics. All references cited in the text or at the end of an abstract were removed, as were place names and the names of individuals (e.g., "Horton" as in "Horton's method") and institutions. All acronyms and abbreviations were written out in full, except for "GIS" and "www." Finally, the English was standardized using the UK English option in the word processor spell checker. The files were saved in ASCII format and sent to the Center for Intelligent Information Retrieval in the Department of Computer Science at the University of Massachusetts for word and phrase analysis.
Two files were generated for each conference, one containing words and the other, phrases. Each file consists of the list of terms or phrases sorted in decreasing order according to the number of times that word or phrase is used in the abstracts and the number of abstracts in which each word or phrase occurs, such as shown in Table 1. The phrase "spatial analysis" thus occurs nine times in eight abstracts and the phrase "data model(s)," six times in four abstracts.
|
|
|
|
||
| spatial | analysis |
|
|
|
| digital | elevation | model(s) |
|
|
| spatial | variation(s) |
|
|
|
| functional | pattern |
|
|
|
| data | model(s) |
|
|
|
| geographic | space |
|
|
|
| visibility | index(ices) |
|
|
|
WordNet, a dictionary-based lookup table, was used to identify parts of speech for finding noun phrase candidates (Feng and Croft, 2000). The sequence number was set at a maximum of five words in a row. Sentence boundaries were respected as was sequence ordering, so phrase candidates that span two or more sentences were not included.
A trained Markov model was then applied to extract the noun phrases from the phrase candidates. Delimiting rules included stop words (commonly occurring words such as "the," "a," or "and"), numbers, punctuation (e.g., hyphens, quotation marks, periods), verb patterns, and formatting delimiters, such as table fields and section heads.
A Markov model is a statistical process in which future events are determined by present state rather than by the method path in which the present state arose. The phrase detection model uses the set of states for each word position in the phrase. A single term has a single set of states, while a five-word phrase has five such state sets. There are probabilities that each word in the vocabulary (all unique words in the collection) can occur in a given phrase word state set. Non-noun words are removed from the end of phrase candidates, and the phrases are clustered based on occurrence frequency above a threshold value. Candidates not passing the threshold are discarded, leaving the proposed phrases.
Each file was then run through a simple C program that adds the frequencies of occurrence for each repeat entry. An intermediate file was produced in which identical phrase entries were summed for single abstracts; this was not done for words. The results of the summation process were then sorted according to word or phrase frequency and abstract frequency for each file. These final summation files were used for the analysis of word and phrase frequencies for each conference: "abstract frequency" in Table 1 thus represents the total number of abstracts that contained the word or phrase for an individual conference. Word or phrase frequency refers to the total number of times the word or phrase is used. The summations ignore case, so "GIS" and "gis," for example, were considered equal. Plural forms were then combined manually into one form for the phrases. Thus, "neural network" and "neural networks" have combined statistics and are entered as "neural networks(s). A test set of data was checked for accuracy, and spot checks indicated the results were satisfactory.
The resulting files, two for each conference, were reviewed, and all words and phrases that occurred in only one abstract were deleted to reduce the files to manageable size. Words and phrases meaningless in a GeoComputation context, e.g., words such as "versa" as in "vice versa", "priori" as in "a priori;" and phrases such as "paper describes," "works well," and "wide variety" were also deleted. Table 2 shows the changes in file size as these steps were accomplished. Finally, the percent frequency for each word or phrase was determined by dividing the number of abstracts in which that word or phrase occurred by the number of abstracts at that particular conference. This not only permitted the words and phrases from one set of conference abstracts to be evaluated in a semi-quantitative manner, but also allowed comparison between conferences. Because plural forms were merged with singular forms for the phrases, some inflation occurs in the phrase percent frequencies.
Table 2. Reduction of Data File Content
| Conference |
|
|
||
|
|
|
|
|
|
| Leeds |
|
|
|
|
| Dunedin |
|
|
|
|
| Bristol |
|
|
|
|
| Fredericksburg |
|
|
|
|
| Chatham |
|
|
|
|
As noted above, in order to compare words among the five conferences, the number of abstracts in which a word occurred for each conference was normalized according to the number of abstracts presented at that conference. Table 3 lists the percent frequencies for the most frequently used words for each conference, and Table 4 shows the percent frequencies for the most frequently used words common to all five conferences. The order of importance was determined by sorting the list using mean percent frequency for each common word. We attempted to restrict ourselves to the 25 most frequently used words at each conference, but this was not possible because word 25 was typically in the middle of a list of words with the same frequency. The source data used for the word analysis is available for the reader's reference.
Table 3: Most Frequently Used Words at Each of the Five Conferences
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
| data |
|
GIS |
|
data |
|
data |
|
data |
|
| spatial |
|
data |
|
spatial |
|
spatial |
|
spatial |
|
| GIS |
|
spatial |
|
model |
|
model |
|
analysis |
|
| model |
|
information |
|
analysis |
|
Information |
|
information |
|
| analysis |
|
system |
|
modelling |
|
analysis |
|
GIS |
|
| models |
|
analysis |
|
models |
|
time |
|
models |
|
| information |
|
models |
|
GIS |
|
GIS |
|
model |
|
| time |
|
geographic |
|
time |
|
models |
|
system |
|
| modelling |
|
systems |
|
scale |
|
area |
|
process |
|
| systems |
|
time |
|
area |
|
modelling |
|
time |
|
| number |
|
environmental |
|
information |
|
process |
|
area |
|
| areas |
|
model |
|
system |
|
system |
|
areas |
|
| system |
|
classification |
|
systems |
|
number |
|
map |
|
| process |
|
tool |
|
number |
|
space |
|
scale |
|
| area |
|
field |
|
process |
|
systems |
|
modelling |
|
| geographic |
|
area |
|
geographic |
|
local |
|
structure |
|
| processes |
|
boundaries |
|
processes |
|
areas |
|
number |
|
| range |
|
computer |
|
resolution |
|
geographic |
|
statistical |
|
| point |
|
modelling |
|
areas |
|
resolution |
|
region |
|
| statistical |
|
tools |
|
flow |
|
processes |
|
range |
|
| structure |
|
database |
|
space |
|
surface |
|
points |
|
| software |
|
land |
|
structure |
|
digital |
|
local |
|
| scale |
|
local |
|
environment |
|
distribution |
|
computer |
|
| computer |
|
spatially |
|
classification |
|
features |
|
objects |
|
| tools |
|
areas |
|
distribution |
|
field |
|
systems |
|
| interaction |
|
dimensions |
|
location |
|
tools |
|
classification |
|
|
|
|
landscape |
|
numerical |
|
values |
|
software |
|
|
|
|
map |
|
|
|
|
|
space |
|
|
|
|
maps |
|
|
|
|
|
tools |
|
|
|
|
patterns |
|
|
|
|
|
computational |
|
|
|
|
points |
|
|
|
|
|
environment |
|
|
|
|
positioning |
|
|
|
|
|
series |
|
|
|
|
scale |
|
|
|
|
|
|
|
Table 4. Percent Frequencies for the 25 Most Frequently Used Common Words
|
|
|
|
|
|
|
| Word |
|
|
|
|
|
| data |
|
|
|
|
|
| spatial |
|
|
|
|
|
| GIS |
|
|
|
|
|
| analysis |
|
|
|
|
|
| model |
|
|
|
|
|
| information |
|
|
|
|
|
| models |
|
|
|
|
|
| time |
|
|
|
|
|
| modelling |
|
|
|
|
|
| system |
|
|
|
|
|
| systems |
|
|
|
|
|
| area |
|
|
|
|
|
| process |
|
|
|
|
|
| number |
|
|
|
|
|
| areas |
|
|
|
|
|
| tools |
|
|
|
|
|
| geographic |
|
|
|
|
|
| classification |
|
|
|
|
|
| distribution |
|
|
|
|
|
| processes |
|
|
|
|
|
| computer |
|
|
|
|
|
| field |
|
|
|
|
|
| tool |
|
|
|
|
|
| spatially |
|
|
|
|
|
| global |
|
|
|
|
|
From these data, we can see that geocomputationalists are more concerned with data and the spatial nature of their data, than they are with modelling, results, or applications. Our data consist of numbers, points, and models, at least some of which are digital. The spatial nature of GeoComputation is exemplified by such words as areas, maps, patterns, scale, space, distribution, location, and region. The tools we use include GIS, statistics, classification, maps, positioning, and images. We apply the tools to processes, land, landscapes, and the environment over time. We deal with information, systems, and all things geographic, and we use computers to do this.
We can also look at tools and applications in this same way. The most common tools are unspecified; the only specific tools, in addition to GIS, among the words with the highest percent frequencies, are classification and computer (Figure 2). This is because most tools are phrases, e.g., neural networks, cellular automata. Emphasis on unspecified tools was least at the Bristol conference, and greatest at the Leeds and Dunedin conferences. If one combines the frequencies for the two words (tool and tools), however, emphasis is quite consistent from conference to conference. The use of GIS was again highest at Leeds and Dunedin, and although percent frequencies were lower at the 1998, 1999, and 2000 conferences, they appear stable, ranging from about 36% to 42%, as noted above. Emphasis on classification was greatest at Dunedin; the percent frequencies at the other four conferences were similar, ranging from just under 15% to just over 17%. Percent frequencies for computer decreased consistently from 1996 through 1999, but rose slightly in 2000. Figure 2 suggests that emphasis on tools in general has diminished over time; percent frequencies for individual tools are becoming more similar.
Longley (1998) implies that GIS provides the basis for GeoComputation, yet others (e.g., the conference announcement for Dunedin) state emphatically that this is not so. Percent frequencies in Table 4 indicate that Longley's statement is true. It should be noted, however, that many references to GIS in the abstracts are made in an almost negative or derogatory way: authors typically refer to advances or improvements that their work has made to "traditional GIS." "Geographic Information Systems" is, of course, a phrase, but all usages were converted to the word "GIS" in the abstracts before word analysis was done. The words "geographic," "information," and "systems," as they appear in Table 4, are in fact separate words. Typical uses of these words are geographic information and information systems.
Only two applications appear in the list of 25 words with the highest percent frequencies: process and processes (Figure 3). Percent frequencies for both were highest at Leeds in 1996. Percent frequencies decrease at Dunedin, but then steadily increase from 1998 through 2000. Overall, percent frequencies for process and processes appear to be decreasing over time.
We can also look at words not included on the most frequently used list, for example, GeoComputation and geocomputational. Figure 4 shows how use of these terms has changed since 1996. Percent frequencies were lowest at Leeds, when the word was first introduced, then increased appreciably at Dunedin. Percent frequencies for GeoComputation decreased from 1997 through 1999, then increased substantially in 2000. Geocomputational has decreased in frequency consistently since 1997. These percent frequencies, however, are quite low. GeoComputation was mentioned in about 3% of the abstracts in 1996; in about 11%, in 1997; in about 7%, in 1998; in just over 2%, in 1999; and in about 12%, in 2000. Geocomputational appeared in about 2% of the abstracts in 1996; 12% in 1997; just over 8% in 1998; and in about 3.5% of the abstracts in 1999 and 2000. Disregarding the Dunedin and Fredericksburg conferences, the percent frequencies have been remarkably consistent from conference to conference. It is possible that the very low percent frequencies in 1999 are due to "first use" of the term in the United States; very few participants at previous conferences were Americans and many authors were thus unfamiliar with the term. GeoComputation is most commonly used in keynote presentations, in the context of definition, nature, and scope.
At this point, temporal and spatial limitations restrict the number of words that can be compared, so we have arbitrarily chosen to track from 1996 to 2000 several application areas and several tools as well as a couple of words that are of general interest to us. We look at applications in human geography and hydrology; and then at the tools statistics, artificial intelligence, and the Internet; and finally, at time and data quality.
Because the term "GeoComputation" was coined with reference to human geography (Openshaw, 2000), let us begin with this disciplinary area. There are many words from all five conferences that relate to this subject, but few are exclusive to human geography or to the various subdisciplines within human geography. We have therefore selected five words that are related to human geography and are applicable to most areas of research within this field: social, demographic, census, city, and urban. The percent frequencies for these words are shown graphically in Figure 5. But first a few caveats: none of these words occur in two or more Dunedin abstracts, so this data set is not included in the analysis; demographic is used in only one Fredericksburg abstract, and was thus not included in the data set analyzed; and city does not appear in the Bristol abstracts. All terms, with the exceptions of demographic and census, have decreased in percent frequency from 1996 to 2000. Percent frequencies for all words were lowest at Fredericksburg in 1999. Percent frequencies for social, city, and cities were highest at the Leeds conference. The highest frequency for census was at the Chatham conference, which is not surprising with the 2001 UK census imminent. The low frequency for census at the Fredericksburg conference is surprising for the reverse reason; the 2000 US census was imminent at the time. The highest percent frequency for urban occurred at the Bristol conference in 1998 and the lowest at Fredericksburg. Bristol is the largest city in which a GeoComputation conference has been held, and Fredericksburg is the smallest.
Another application area of interest is hydrology (Figure 6). Papers on hydrology have been presented at all five conferences, although there were few at Dunedin. Eight words were selected for analysis, words that occurred in at least four sets of abstracts and that are unequivocally related to hydrology. These words are: catchment, drainage, flood, hydrologic(al), river, runoff, stream, and water. Percent frequencies were highest in 1998 and lowest in 1997, where only flood and water were used. Interestingly enough, the frequencies for these two words at Dunedin, 8.9% and 13.3%, respectively, were the highest for those words at all five conferences. Percent frequencies were relatively stable for the 1996, 1999, and 2000 conferences, in which these words appeared in about 43% to 52% of the abstracts. With respect to the individual words, percent frequencies for flood and water decreased from 1996 to 2000, and those for hydrologic(al) and stream increased. Percent frequencies for runoff and catchment are generally up; river and drainage appear to be relatively stable. It thus appears that hydrology continues to be a major application area for geocomputational research.
One tool not included in the 25 words with highest percent frequencies that can be addressed with respect to single words is statistics (Figure 7). Percent frequencies for statistical and statistics were highest at the 1996 conference, and then decreased considerably at the 1997 conference, but between 1997 and 2000, percent frequencies appear to have stabilized, appearing in about 6% to 10% and about 3.5% to just under 7% of the abstracts, respectively. Five words that can unequivocally be related to statistics that occurred in abstracts from at least four conferences were chosen for analysis: correlated, multivariate, nonlinear, regression, and variance. All five words were used at the 1996, 1998, 1999, and 2000 conferences; none were used in 1997. Of these five words, only regression and variance appear to be increasing in use, and only multivariate is decreasing. Nonlinear appears to be rather unstable, whereas correlated is relatively stable. All percent frequencies are low; the only percent frequency over 25% being for nonlinear at Fredericksburg. Mean percent frequencies are below 8.5% for the remaining four words. Traditional statistical analysis, however, has a continuing, albeit low level, presence in the conference series, and thus appears to be here to stay as a geocomputational tool.
Because artificial intelligence plays such a big role within GeoComputation, it is useful to look at tool words related to this specialty. Five words were selected for analysis: neural, fuzzy, expert, genetic, and automata (Figure 8). It is obvious that all five words are actually parts of phrases, i.e., neural network(s), fuzzy logic, expert systems, genetic algorithms and programming, and cellular automata, but we believe analysis of individual words may provide meaningful information about this area of expertise. All five words appear in the abstracts for 1996, 1998, 1999, and 2000; only neural, expert, and genetic occur in the 1997 abstracts. Percent frequencies for all five words were at their highest at Leeds. Expert and genetic decreased in percent frequency from 1996 through 1998, then increased from 1999 to 2000. Automata was relatively stable from 1996 through 1999, but decreased in frequency in 2000. Neural, the word with highest percent frequency, decreased in frequency between 1996 and 1997, and then increased from 1998 through 2000. Fuzzy decreased in frequency between 1996 and 1998, increased in 1999, but then decreased again in 2000. If one ignores the increase in 1999, percent frequencies for fuzzy seem to be decreasing over time. Overall, percent frequencies for these artificial intelligence tools decreased from 1996 to 1998, and then increased in frequency from 1998 to 2000, suggesting a renewal of interest.
One final tool that can be effectively evaluated using individual words is the Internet (Figure 9). Reference to the Internet itself and all associated terms was at a maximum at Fredericksburg and Chatham. Furthermore, percent frequencies, with the exception of the 1997 conference, have steadily increased since 1996. Four words - Internet, web, worldwide web (abbreviated www in word analysis), and online - were selected for analysis. Each word is present in the abstracts from at least three conferences. Percent frequencies are low: only web at Fredericksburg occurred in more than 20% of the abstracts. However, the number of words used at the individual conferences is increasing over time: three words appeared in the Leeds abstracts; one, in the Dunedin abstracts; three, in the Bristol abstracts; and all four, in both the Fredericksburg and Chatham abstracts. Use of the words online, which did not appear until 1998, and the Internet, are increasing, and the percent frequency of web is generally increasing. Only percent frequencies for worldwide web are generally decreasing. This may result from general changes in word usage over time or from lack of consistency by the person who prepared the abstracts for word and phrase analysis (JE). There are also many references to various web sites in abstracts from all five conferences. These references were not included in the analysis, primarily because they were given to provide information in addition to that found in the abstract about the author's work. Regardless of the low percent frequencies, however, we believe this is an emerging technology in the field of GeoComputation.
Time is an important concept at all five conferences. Three words that deal with time, which occurred in the abstracts of three or more conferences, were selected for analysis: time, spatiotemporal, and temporal (Figure 10). Percent frequencies for all words, except time, are low, with a maximum of just under 17% for temporal at Fredericksburg. Three of these words were used in the Leeds, Fredericksburg, and Chatham abstracts; and two in the Dunedin and Bristol abstracts. The highest percent frequencies were achieved at Leeds. However, mean percent frequencies have decreased over time, possibly suggesting diminishing interest. Only spatiotemporal is increasing in use. Time itself is decreasing (!!). Only temporal may be stable; the pattern for this word is irregular.
Finally, as noted by Brooks and Anderson (1998), data quality is a key issue in GeoComputation. So before moving to the analysis of phrases, we look at how GeoComputation researchers treat data with reference to data quality, accuracy, errors, and uncertainty (Figure 11). Five words were selected for analysis - quality, accurate, error, errors, and uncertainty - although the word quality may not be unequivocal in this context. The five words were used at the 1996, 1998, 1999, and 2000 conferences; only four were used in 1997. In general, quality is decreasing over time. However, percent frequencies for uncertainty and accurate increased from 1996 through 1999, although they decrease in 2000. Error appears relatively stable, but percent frequencies for errors are decreasing. With the emphasis on modelling in GeoComputation, often using synthetic data or data over which the modeller has little to no control (e.g., data from various Internet sites, digital elevation data, or census data), an understanding of data quality, and the ability to address accuracy, error, and uncertainty quantitatively is often of crucial importance. One hopes the decreases in the percent frequencies for these words at Chatham are aberrations.
Several caveats are required before we begin analysis of phrases. First, spot checks of the phrase source data show that what the software identified as a phrase is not necessarily. Examples of such errors include "biochemistry exhibiting reflectance," "dimensional topological," and "important research remains." This problem is at least partly due to the complexity of the English language: "remains," for example, can be either a noun or a verb, and in this case the software identified a verb as a noun, producing a meaningless phrase. Second, we have found that certain phrases of interest to us, such as high performance computing and exploratory data analysis, were not identified by the software. Finally, the software is arbitrary in its identification process. For example, we are interested in the phrase "artificial intelligence." Let us say that the software identified this phrase in two abstracts at one conference. But it also identified the words "artificial" and "intelligence" as parts of a different phrase, e.g., "artificial intelligence technologies," in two abstracts. We cannot combine these and say the phrase "artificial intelligence" occurs in four abstracts because we do not know whether the two abstracts in which "artificial intelligence" occurs are the same as the two in which "artificial intelligence technologies" occur. We could determine this, by checking every single phrase of interest in every abstract, but because of the laborious nature of this task, we chose to use the results of phrase analysis as is. In this example, then, we would say there were only two occurrences of the phrase "artificial intelligence." Finally, it must be remembered that our data set includes only those phrases that occur in two or more abstracts at each conference. The results of phrase analysis discussed below must thus take these caveats into consideration.
Table 5A. Most Frequently Used Phrases, 1996, 1997, and 2000
|
|
|
|
|
|||
|
|
Phrase |
|
Phrase |
|
Phrase |
|
|
|
spatial data |
|
spatial data |
|
spatial data |
|
|
|
data set(s) |
|
spatial information |
|
neural networks |
|
|
|
neural network(s) |
|
spatial analysis |
|
data set(s) |
|
|
|
spatial analysis |
|
neural networks |
|
spatial analysis |
|
|
|
spatial object(s) |
|
quantitative revolution |
|
genetic algorithm(s) |
|
|
|
spatial distribution |
|
digital elevation |
|
GIS software |
|
|
|
knowledge base |
|
aerial photographs |
|
Voronoi diagram(s) |
|
|
|
digital elevation model(s) |
|
geographic space |
|
census data |
|
|
|
geographic data |
|
geographic data |
|
cluster analysis |
|
|
|
genetic algorithm |
|
spatial dimensions |
|
data structure |
|
|
|
expert system(s) |
|
computational geography |
|
digital elevation models |
|
|
|
cellular automata |
|
resource management |
|
geographic information |
|
|
|
data model(s) |
|
expert systems |
|
time series |
|
|
|
data analysis |
|
|
|
catchment area |
|
|
|
data structure(s) |
|
|
|
cellular automata |
|
|
|
spatial relations |
|
|
|
computational tool(s) |
|
|
|
sensitivity analysis |
|
|
|
computer simulation |
|
|
|
fractal dimension |
|
|
|
correlation coefficient |
|
|
|
genetic programming |
|
|
|
data mining |
|
|
|
statistical analysis |
|
|
|
digital photogrammetry |
|
|
|
elevation model(s) |
|
|
|
earth=s surface |
|
|
|
time series |
|
|
|
economic variables |
|
|
|
raster GIS |
|
|
|
elevation model(s) |
|
|
|
mathematical model(s) |
|
|
|
GeoComputation techniques |
|
|
|
artificial intelligence |
|
|
|
GIS technology |
|
|
|
analytical tool(s) |
|
|
|
GIS application |
|
|
|
modelling tools |
|
|
|
grid modelling |
|
|
|
physical processes |
|
|
|
predictive models |
|
|
|
regression analysis |
|
|
|
proximity relations |
|
|
|
spatial information |
|
|
|
satellite images |
|
|
| ||||||