Content-based Indexing

Next: Data Access and Up: THE CATALOGUE COMPONENT: Previous: Conceptual data model

Content-based Indexing

Database methodologies are concerned with efficient storage and retrieval of records. A good database offers fast search ability coupled with the ability to handle a large variety of queries. Traditional database techniques have been adequate for many applications involving alphanumeric records which could be ordered, indexed and searched for matching patterns in a straightforward manner. However, these methods are inadequate for image data. The shortcoming is fundamentally due to the fact that the raw data of intensity/color at image points (pixels) does not lend itself easily to interpretation at the level which is of interest to typical applications, leading to the conclusion that for image databases, the research into the database methodology cannot be separated from the fundamental related issues in pattern recognition and image analysis. In the context of indexing image data the following issues stand out:

Image Features: The raw pixel data is obviously not the best way to represent information for search. Image features need to be detected by appropriate preprocessing. However, defining an image feature is an application and data dependent task; A feature set useful in medical images may not be the right one for satellite images. So an important first task is to decide on the choice of image level features.

Similarity Measures: Once a feature set is defined, in order to compare different feature vectors in a multidimensional scheme, appropriate similarity measures are necessary. In contrast with alphanumeric indexing, these similarity measures do not provide binary decisions - rather they provide a measure (often a fuzzy one) on a continuous scale. Most of the similarity measures used computer vision and pattern recognition research are ad-hoc in nature and there exists no systematic study to evaluate the performance of these measures under different conditions.

Dynamic Indexing: Image features to be detected and similarity measures defined on such features depend on the data, and on the user requirements. Two users of the same data might be interested in different information contained in the images. Thus a one time static database modeling support is inadequate for such an environment.

Our approach to addressing these problems consists of (a) defining a set of generic image features which are extracted at the time of storage, and thus providing indexing based on these features, and (b) providing an evolutionary dynamic indexing ability to create local user profiles which integrates various image processing tools with user defined features.

Segmentation and Classification Based on Texture: Texture information is an integral part of all satellite image data. Unfortunately, it is difficult to characterize texture information precisely in most cases. Previous work in this research can be broadly classified into two groups - model based and model free. Model based approaches to texture segmentation work on the assumption that the underlying texture intensity distribution can be parameterized using models such as Markov random fields (see for example [15]). The problem then reduces to estimating the model parameters and using that information in classification/segmentation [42][43]. Automating this process of segmentation is a difficult problem. However, by allowing some amount of human interaction at the time of data storage, and coding the texture primitives appropriately, the retrieval of information can be made very efficient and robust. For example, the user can identify typical regions characterizing the texture, and the computer then can estimate the model parameters (e.g., Markov Random Field parameters) and use these parameters later on for indexing. Since retrieval of information is usually interactive, this will make possible efficient browsing of large amount of image data.

Contour Information: Another important feature commonly found in satellite data are the boundaries between different regions of interest (for example, the coastal line). Contour attributes such as length, curvature and closed or open contours can be used as features for indexing.

Texture and contour information can be used to segment and classify images to provide object level metadata for indexing. We have developed methods based on MRF to classify such textures as water, sand, vegetation, etc., and they have been extended to deal with synthetic aperture radar (SAR) data. Supervised classification techniques will be used at the time of data storage to extract texture classes wherever appropriate for indexing purposes.

Next: Data Access and Up: THE CATALOGUE COMPONENT: Previous: Conceptual data model

Ron Dolin
Wed Dec 7 23:25:02 PST 1994