Abstract
Spatial databases store geographical features which are characterised by spatial attributes, a shape and a location. The term spatial analysis refers in general to reasoning about problems dealing with the spatial attributes of entities. But before using geographical data in spatial reasoning or spatial queries, we must answer the following question: Are the data stored in the Geographical Information Systems (GIS) reliable? This poses the problem of data quality in GIS.
In a great number of GIS, data were acquired without paying any attention to geometric properties of geographical features. As a result, a lot of geographical databases contain errors, and especially geometrical and topological errors which disable spatial reasoning and hinder GIS interoperability. Those errors have to be corrected before making any spatial analysis. The first goal is to improve data quality by correcting and by enriching GIS. Correcting is to ensure spatial answer reliability, and enriching is to improve data access.
A set of properties that geographical data have to follow to avoid errors in geographical databases will be given. Error correcting will be made by checking those properties and searching situations in which they are not followed by the data. This operation will lead to content modifications of the databases. The correction will be made automatically by dedicated programs or semi-automatically by a visual interface for force-fitting (correction operations are proposed to the user who must choice one of them). Enrichment will be made by deriving new relations, and especially topological relations such as adjacency or inclusion.
Data in GIS are stored using models that have been designed to solve a particular problem. Several representational structures have been proposed for the representation of spatial knowledge in different areas. Graph data models are used for networks, spaghetti and polygonal data models are used for cadastre, etc. Usually, each model stores features of a same type (road network, landcoverage, cadastre, etc.) in a layer. Those layers or databases have to be reliable before making any merge of data, or any reasoning that involves data coming from several sources. Each different data model has properties deriving from the type and the semantics of data. Depending on the data structure, some properties are more important, some can not be applied, and some others are useless.
A study of the usual data models is made. The goal is to find out which properties are explicitly stored in each model and which ones must be added (or verified) to match the requirements. The properties defined to correct data are not sufficient because of the diversity of data models, and because the semantics of data are not used to define the properties (they rely only on the geometry of spatial objects).
A method to define constraints on geographical data will be designed to take the semantics of the database into account. It will also allow to customise the set of properties. Those constraints will rely on topological relations between spatial objects and will be described using a visual interface. The user will just have to describe a situation between two geographical features, and attach a specification, such as forbidden, to the scene.
This paper will study all the points given in previous paragraphs. After a presentation of the most usual data models used in GIS, the problem of quality control will be discussed. A list of properties that geometric objects have to follow will be given. This list will be used to examine the data models presented and then to compare their ability to handle the properties. Then, a method to define new topological constraints will be given. The constraints will rely on topological relations that will be defined using the 9-intersection model first described by Egenhofer. Finally, a visual interface to enter those constraints will be presented. This will give an easy and visual method to describe a topological relation and to define constraints.