LIRMM, 161 rue ADA, 34392 Montpellier Cedex 5, France
Data update is a significant stage in the life of a geographical information system (GIS). In many cases, GIS are based on data provided by organizations such as the cadaster. The data transfer between a producer and a user is then often a bulk transfer. We study problems arising from the integration of this new data in the user GIS. To improve this data integration, we propose a two-step process: (1) Altered objects are detected in the transfer dataset. (2) Then, connections between altered reference data and data that replace it are established.
Nowadays, GIS are used by many local or federal authorities. These information systems help decision-makers. They are based on specific geographical data. This data is often provided by specialized institutes. These organizations have specific missions. For instance, in France the cadaster manages the cadrastal map for tax ends. The cadaster is thus a data producer who delivers information to users.
In this context, the update process is a difficult task. Initially, the producer updates its products. He delivers regularly new datasets to users. Users should integrate this new data in their information system.
We present in section 2 the producer and user context within the framework of an update process. We specify in section 3 problems raised by operations described in section 2. Section 4 presents a process in order to improve the update task carried out by the user.
For users, GIS are set up to answer specific tasks such as town planning, facility management. Thus, such information systems focus on operations that are useful to perform these tasks.
Data stored in these information systems is described according to a schema (called User Schema in figure 1). This schema translates the user representation of the real world made according to a model (entity-association, object, ...) . This schema should also take user needs into account. However, data may have distinct origin.
Figure 1: User GIS
We distinguish (according to its origin) data that we call:
Reference data and user data coexist in the user information system. Moreover, they are bound by semantic and spatial relations. Commercial GIS makes it possible to separate data in distinct information layers. For [Laurini and Thompson, 1992], such an organization may correspond to various perceptions of the real world made by the user. Hence, these layers are ``views'' such as those defined for databases. Thus, a layer corresponds to a specific topic. This organization in layers enables us to distinguish between reference data and user data. Two implementations are thus possible:
Figure 2: Different Implementations
These two implementations imply different problems relative to reference data update. These problems are the subject of section 3. In the following subsection, we present how reference data may be integrated in a user GIS.
Data transfer between the producer and the user is relative to
reference data. Data delivered by the producer is included in a
specific product. We call this product the transfer
dataset. Only bulk transfer is considered here. This
transfered dataset describes, like a snapshot, the state of the world
at transfer time. Such a data transfer is composed of data, a schema
that structures transfered data (called transfer schema) and
sometimes metadata (see figure 3). The transfer dataset is
encoded according to a given format.
Example: The transfer of the French digital cadastral map (DCM) obeys to the EDIGéO format. The transfer schema is specified by the producer [Direction Générale des Impôts, 1995]. Information contained in a dataset is mainly graphic. The description concerns land parcels, buildings and topographic data. Information relative to land owners is delivered separately.
Figure 3: Data Transfer
The user can not directly use data provided by the producer. This data must undergo some transformations during the integration process.
However, due to human intervention or natural phenomena, the geographical space evolves. The information system has to represent these evolutions:
The producer regularly provides to the user new datasets. These datasets describe the state of the studied area at the transfer time. Such successive data transfers allow to update reference data in the user information system.
For each data transfer, the user has to repeat the integration process described in the previous section. If transfer schema and user schema do not evolve, rules and correspondences established between both schemata are still valid. This data integration process leads to the creation of a new dataset in the user database. Now, the user information system has two different datafiles:
Figure 4: User GIS after the integration of a new dataset
According to the implementations:
The integration process is dependent on the final functionality of the information system. Problems arising from the introduction of new reference data are relative to the management of the existing relations between reference and user data. According to user implementation (one or multiple information layers) and to the way of using reference data, the introduction of new reference data in an information system may be more or less complex.
Several information layers. Reference data is stored in a specific information layer.
Example: A user captures trees (user data). Reference data consist of buildings. Buildings are surrounded with trees. An integrity constraint may be expressed as: ``trees are located outside buildings''. If a side extension is built onto a house, some trees may be located in this building. However this violates the integrity constraint.
The user has to specify which constraints stored data should conform to. In this context, [Ubeda and Egenhofer, 1997] propose a process to define, to detect and to correct such errors in a GIS. It is however necessary to apply this mechanism to the whole dataset.
Example: The department of transportation locates traffic accidents. These traffic accidents are user data. It is stored in a specific information layer. This data is superposed to the road which is a reference data stored in another information layer. After an update transfer, new reference data is imported in the information system. Sometimes, traffic accidents are no more placed on a road. Indeed, the road alignment may have changed [Dubreil, 1996].
In this configuration, information layers are distinct. Therefore, each layer has its own geometrical description. Updates relative to reference data should be propagated towards user data which is connected to it.
Example: the intersection between a house (a reference data) and a power line (a user data) is represented by a node. If the power line is removed, this node must also be deleted.
Reference and user data may share a common graphic primitive. Then, reference data should be updated in accordance with user data. Sometimes, for reference data, spatial property may not be updated.
In many cases, inconsistencies appearing in the user information system are relative to data semantics or to spatial relations semantics. These problems are closely connected with the implementation chosen by the user for its information system. We present several problems arising with the introduction of a new reference data in the user information system. The complete replacement of reference data is then a difficult task. Hence, user often seeks to limit this update process to data items which evolved.
To reduce the number of operations to be carried out during the update process, the user seeks to isolate data which underwent transformations among the update dataset. After the integration process, it is possible to compare new and old reference datasets in order to detect changes that occurred. Hence, this step is made up of a systematic comparison of objets stored in these datasets. For each object, modifications that affect spatial and non-spatial properties are searched. [Lemarié and Raynal, 1996] presents some tools which allow to compare datasets.
However, we believe that the identification of changes that occurred must be done before the integration process. Indeed, the number of changes that occurred is small compared to the volume of transfered data in a bulk transfer. If altered objects are identified in the dataset transfer before their integration, the amount of objects to be integrated is limited. We present in the following paragraph how geographical features which underwent modifications may be isolated within the dataset transfer.
We use information contained in the dataset transfer to isolate altered objects. The DCM transfer is our case study. So, basic evolutions are relative to the following operations:
To isolate changes that occur we use either the object identifier, which is a geographic identifier or temporal information on update time.
We suggest to maintain a list of geographical identifiers for land parcels that are already integrated in the user information system. This data stems from the old dataset. Identifiers that stem from the update dataset transfer form a second list. These two lists are then compared. Table 1 presents the result of this process. The user gets a list of unchanged parcels, a list of deleted parcels (called old parcels) and a list of new parcels.
Objects whose temporal information (transactional time) is greater than the time at which the previous data transfer occurred are altered. Then, for the new update dataset, the user gets a list of altered object. However, he has no information about changes that could affect data items stored in his information system. It is not possible to know if altered items refer to an item creation or if there is a modification of an existing item. We use temporal metadata to detect altered buildings.
Such processes enable us to extract a subset from the bulk transfer. This subset is limited to the new state of objects which underwent modifications. Hence, only the new state for altered objects is integrated in the user information system. It is then useful to analyze filiations between the old and the new state of reference data.
Several reference data coexist in the user information system. The old dataset may include objects which are no more up-to-date. Moreover, objects which underwent evolutions are also stored in this information system. These new objects should then be associated with data they replace. Therefore, from the previous results, we establish filiations between old and new objects.
The user has a list of the altered items. This list includes a set of items that appeared in the new dataset (i.e. new parcels) and a set of items that disappeared from the old dataset (i.e. old parcels).
Between successive data transfers, the cadaster performs several updates. Basic updates follow one another in a combination of division/merging/extraction or integration in the State property. There is a large number of updates if the area undergoes lots of successive cadastral reconfigurations. Hence, a new land parcel may result from several fractions of destroyed parcels. Then, to associate new and old land parcels, we use the following mechanism:
The user only has the list of the objects modified in the new dataset. In this context, we carry out the intersection between reference data already integrated in the user information system (i.e. old buildings) and the new objects (i.e. new buildings). The mechanism is similar to the previous one. However, we do not know which objects (old reference data) were modified.
In the context of GIS, we study the problems arising from the data transfer between a producer and an user. For the user, the integration of new reference data in an existing information system is a difficult operation. Indeed, according to the implementation chosen by the user, links between reference data and user data should be managed. Therefore, we analyze the relations and problems arising when an update occurs. This enables us to present an approach based on the detection of changes before the integration of reference data in the user information system. We believe that our process simplifies integration and incorporation of new reference data. So, established ancestry instance connections should improve the substitution of old reference data and the management of possible relations between this data and user data. This new task constitutes the base of our future work.
[Clement et al., 1997] Clement, G., Larouche, C.,Gouin, D., Morin, P. and Kucera, H., 1997. OGDI: Toward Interoperability among Geospatial Databases. SIGMOD RECORD 26(3), pp. 18-23.
[Direction Générale des Impôts, 1995] Direction Générale des Impôts. Standard d'échange des objets du plan cadastral informatisé. 1995.
[Dubreil, 1996] Dubreil, F., 1996. Etude de l'intégration de la mise à jour des données BDCarto dans les bases utilisateur de l'équipement. Master's thesis, Ecole Nationale des Sciences Géographiques, Saint-Mandé France.
[Laurini and Thompson, 1992] Laurini, R. and Thompson, D., 1992. Fundamentals of Spatial Information Systems. The A.P.I.C. Series, Academic Press.
[Lemarié and Raynal, 1996] Lemarié, C. and Raynal, L., 1996. Geographic data matching : first investigations for a generic tool. In: GIS/LIS'96, pp. 405-420.
[Nyerges, 1989] Nyerges, T. L., 1989. Schema Integration Analysis for the Development of GIS Databases.Int. Journal of Geographical Information Systems 3(2), pp. 153-183.
[Spaccapietra et al., 1996] Spaccapietra, S., Parent, C. and Devogele, T., 1996. Analysis of Discrepancies in Spatial Data Representation. In: Cooperative Database Systems for Advanced Systems (CODAS), University of Kyoto and ACM Japan and ACM SIGMOD Japan, Kyoto (Japon).
[Ubeda and Egenhofer, 1997] Ubeda, T. and Egenhofer, M. J., 1997. Topological Error Correcting in GIS. In: M. Scholl and A. Voisard (eds), Advances in Spatial Databases SSD'97, LNCS 1262, Springer, pp. 283-297.