A Framework for Update Process in GIS

Laurent Spéry
LIRMM, 161 rue ADA, 34392 Montpellier Cedex 5, France
Email: spery@lirmm.fr



Abstract

Data update is a significant stage in the life of a geographical information system (GIS). In many cases, GIS are based on data provided by organizations such as the cadaster. The data transfer between a producer and a user is then often a bulk transfer. We study problems arising from the integration of this new data in the user GIS. To improve this data integration, we propose a two-step process: (1) Altered objects are detected in the transfer dataset. (2) Then, connections between altered reference data and data that replace it are established.

1. Introduction

Nowadays, GIS are used by many local or federal authorities. These information systems help decision-makers. They are based on specific geographical data. This data is often provided by specialized institutes. These organizations have specific missions. For instance, in France the cadaster manages the cadrastal map for tax ends. The cadaster is thus a data producer who delivers information to users.

In this context, the update process is a difficult task. Initially, the producer updates its products. He delivers regularly new datasets to users. Users should integrate this new data in their information system.

We present in section 2 the producer and user context within the framework of an update process. We specify in section 3 problems raised by operations described in section 2. Section 4 presents a process in order to improve the update task carried out by the user.

2. Context

2.1 The user information system

For users, GIS are set up to answer specific tasks such as town planning, facility management. Thus, such information systems focus on operations that are useful to perform these tasks.

Data stored in these information systems is described according to a schema (called User Schema in figure 1). This schema translates the user representation of the real world made according to a model (entity-association, object, ...) . This schema should also take user needs into account. However, data may have distinct origin.


Figure 1: User GIS

We distinguish (according to its origin) data that we call:

Reference and user data follow distinct constitution processes. For user data, the user is responsible for the initial data capture and the process undergone by this data. Hence, he is a data producer. We are not interested in this process. Otherwise, reference data is built up by a producer who delivers it.

Reference data and user data coexist in the user information system. Moreover, they are bound by semantic and spatial relations. Commercial GIS makes it possible to separate data in distinct information layers. For [Laurini and Thompson, 1992], such an organization may correspond to various perceptions of the real world made by the user. Hence, these layers are ``views'' such as those defined for databases. Thus, a layer corresponds to a specific topic. This organization in layers enables us to distinguish between reference data and user data. Two implementations are thus possible:

Example: We consider the management of a utility network. The information system often uses cadastral data (e.g. parcel, buildings) to locate the network. Therefore, this cadastral data is reference data. The description of the network, the meter relative to each building are user data. There is a spatial relation between a meter and the building to which this meter is attached. If this data is divided in several information layers, buildings and meters are only graphically superimposed (see figure 2(a)). In the case of a single information layer, spatial relations (which can be topological) between reference and user data are managed (see figure 2(b)).


Figure 2: Different Implementations

These two implementations imply different problems relative to reference data update. These problems are the subject of section 3. In the following subsection, we present how reference data may be integrated in a user GIS.

2.2 The integration of reference data

Data transfer between the producer and the user is relative to reference data. Data delivered by the producer is included in a specific product. We call this product the transfer dataset. Only bulk transfer is considered here. This transfered dataset describes, like a snapshot, the state of the world at transfer time. Such a data transfer is composed of data, a schema that structures transfered data (called transfer schema) and sometimes metadata (see figure 3). The transfer dataset is encoded according to a given format.
Example: The transfer of the French digital cadastral map (DCM) obeys to the EDIGéO format. The transfer schema is specified by the producer [Direction Générale des Impôts, 1995]. Information contained in a dataset is mainly graphic. The description concerns land parcels, buildings and topographic data. Information relative to land owners is delivered separately.


Figure 3: Data Transfer

The user can not directly use data provided by the producer. This data must undergo some transformations during the integration process.

These three stages should be reached to integrate information provided by the producer. They are long and tedious. Reference data may also be completed with user data. This happens after the integration of this reference data in the user information system.

However, due to human intervention or natural phenomena, the geographical space evolves. The information system has to represent these evolutions:

3. The update transfer

3.1 Context of the update transfer

The producer regularly provides to the user new datasets. These datasets describe the state of the studied area at the transfer time. Such successive data transfers allow to update reference data in the user information system.

For each data transfer, the user has to repeat the integration process described in the previous section. If transfer schema and user schema do not evolve, rules and correspondences established between both schemata are still valid. This data integration process leads to the creation of a new dataset in the user database. Now, the user information system has two different datafiles:

This state is described in figure 4.


Figure 4: User GIS after the integration of a new dataset

According to the implementations:

For the user, old reference data have to be replaced with new data. Inconsistencies may thenappear wiyhin the system. Indeed, during the modification of reference data, relations previously established with this object should also be modified. In a GIS, semantic relations are explicitly managed. For an altered object, new semantic relations are then established in accordance with the database schema. Moreover, spatial relations between objects are implicitly managed. Therefore, the user should check that spatial relations between objects are not in contradiction with their semantic definition. Therefore, the user has to establish spatial integrity constraints between object types.

3.2 Constitution of new reference data

The integration process is dependent on the final functionality of the information system. Problems arising from the introduction of new reference data are relative to the management of the existing relations between reference and user data. According to user implementation (one or multiple information layers) and to the way of using reference data, the introduction of new reference data in an information system may be more or less complex.

Several information layers. Reference data is stored in a specific information layer.

A single information layer. Reference data and user data are stored in the same information layer. In this context, the GIS manages spatial relations between these two data types. We identify several problems:

In many cases, inconsistencies appearing in the user information system are relative to data semantics or to spatial relations semantics. These problems are closely connected with the implementation chosen by the user for its information system. We present several problems arising with the introduction of a new reference data in the user information system. The complete replacement of reference data is then a difficult task. Hence, user often seeks to limit this update process to data items which evolved.

4. To improve the update integration process

To reduce the number of operations to be carried out during the update process, the user seeks to isolate data which underwent transformations among the update dataset. After the integration process, it is possible to compare new and old reference datasets in order to detect changes that occurred. Hence, this step is made up of a systematic comparison of objets stored in these datasets. For each object, modifications that affect spatial and non-spatial properties are searched. [Lemarié and Raynal, 1996] presents some tools which allow to compare datasets.

However, we believe that the identification of changes that occurred must be done before the integration process. Indeed, the number of changes that occurred is small compared to the volume of transfered data in a bulk transfer. If altered objects are identified in the dataset transfer before their integration, the amount of objects to be integrated is limited. We present in the following paragraph how geographical features which underwent modifications may be isolated within the dataset transfer.

4.1 To isolate changes

We use information contained in the dataset transfer to isolate altered objects. The DCM transfer is our case study. So, basic evolutions are relative to the following operations:

To isolate changes that occur we use either the object identifier, which is a geographic identifier or temporal information on update time.

Old Dataset
Present
Missing
New 
Dataset
Present
Unaltered
Created
Missing
Deleted

Table 1: Result of the comparison of two datasets based on identifiers

Both mechanisms allow the user to draw up a list of objects that must be integrated in his information system. These items and information relative to them (e.g. metadata) should be extracted from the transfer dataset. Each altered object is separately integrated in the user information system. This process extracts spatial and non-spatial descriptions of this data. Possible relations between objects and their neighbourhood are not taken into account.

Such processes enable us to extract a subset from the bulk transfer. This subset is limited to the new state of objects which underwent modifications. Hence, only the new state for altered objects is integrated in the user information system. It is then useful to analyze filiations between the old and the new state of reference data.

4.2 To establish filiation

Several reference data coexist in the user information system. The old dataset may include objects which are no more up-to-date. Moreover, objects which underwent evolutions are also stored in this information system. These new objects should then be associated with data they replace. Therefore, from the previous results, we establish filiations between old and new objects.

The user has a list of the altered items. This list includes a set of items that appeared in the new dataset (i.e. new parcels) and a set of items that disappeared from the old dataset (i.e. old parcels).

Between successive data transfers, the cadaster performs several updates. Basic updates follow one another in a combination of division/merging/extraction or integration in the State property. There is a large number of updates if the area undergoes lots of successive cadastral reconfigurations. Hence, a new land parcel may result from several fractions of destroyed parcels. Then, to associate new and old land parcels, we use the following mechanism:

This mechanism establishes a relation between an old parcel and its descendant. However, this mechanism does not provide any information about intermediate events that led a land parcel to its final state. Therefore, a complete history can not be built for these land parcels.

The user only has the list of the objects modified in the new dataset. In this context, we carry out the intersection between reference data already integrated in the user information system (i.e. old buildings) and the new objects (i.e. new buildings). The mechanism is similar to the previous one. However, we do not know which objects (old reference data) were modified.

To substitute reference data. The proposed mechanism does not solve all the problems that arise during the integration process. However, we believe that this mechanism makes this process easier. Indeed, establishing correlation between user data and reference data is now limited to a subset of the new dataset that has been modified. Thus, it is possible to determine which user data is associated with the reference data that changed. This process may be carried out while calculating the intersection between the user data and reference data. Then, for each new object, user data that is potentially associated with it is known.

5. Conclusion

In the context of GIS, we study the problems arising from the data transfer between a producer and an user. For the user, the integration of new reference data in an existing information system is a difficult operation. Indeed, according to the implementation chosen by the user, links between reference data and user data should be managed. Therefore, we analyze the relations and problems arising when an update occurs. This enables us to present an approach based on the detection of changes before the integration of reference data in the user information system. We believe that our process simplifies integration and incorporation of new reference data. So, established ancestry instance connections should improve the substitution of old reference data and the management of possible relations between this data and user data. This new task constitutes the base of our future work.

Acknowledgement

The author is grateful to Dr Thérèse Libourel for her valuable advices and comments.

References

[Clement et al., 1997] Clement, G., Larouche, C.,Gouin, D., Morin, P. and Kucera, H., 1997. OGDI: Toward Interoperability among Geospatial Databases. SIGMOD RECORD 26(3), pp. 18-23.

[Direction Générale des Impôts, 1995] Direction Générale des Impôts. Standard d'échange des objets du plan cadastral informatisé. 1995.

[Dubreil, 1996] Dubreil, F., 1996. Etude de l'intégration de la mise à jour des données BDCarto dans les bases utilisateur de l'équipement. Master's thesis, Ecole Nationale des Sciences Géographiques, Saint-Mandé France.

[Laurini and Thompson, 1992] Laurini, R. and Thompson, D., 1992. Fundamentals of Spatial Information Systems. The A.P.I.C. Series, Academic Press.

[Lemarié and Raynal, 1996] Lemarié, C. and Raynal, L., 1996. Geographic data matching : first investigations for a generic tool. In: GIS/LIS'96, pp. 405-420.

[Nyerges, 1989] Nyerges, T. L., 1989. Schema Integration Analysis for the Development of GIS Databases.Int. Journal of Geographical Information Systems 3(2), pp. 153-183.

[Spaccapietra et al., 1996] Spaccapietra, S., Parent, C. and Devogele, T., 1996. Analysis of Discrepancies in Spatial Data Representation. In: Cooperative Database Systems for Advanced Systems (CODAS), University of Kyoto and ACM Japan and ACM SIGMOD Japan, Kyoto (Japon).

[Ubeda and Egenhofer, 1997] Ubeda, T. and Egenhofer, M. J., 1997. Topological Error Correcting in GIS. In: M. Scholl and A. Voisard (eds), Advances in Spatial Databases SSD'97, LNCS 1262, Springer, pp. 283-297.