Membership: Manjunath (leader), Mitra, Wang, Deng, Chae, Hatipoglu
Mission Statement of Team: Statement of Team: Responsible for investigating a variety of issues relating to the integration of spatially referenced information objects into ADL
INTRODUCTION
A quick visual access to the stored data is essential for efficient
navigation through image collections. While textual and image content
based queries help in narrowing down the search space, visual browsing
is used to obtain an overview of the data retrieved. Low resolution
image thumbnails is one approach to such browsing. If image browsing is
performed at multiple resolutions, then the average bandwidth
requirements can be further reduced. In the last year, considerable
progress has been made in multiresolution browsing using wavelets and in
lossless compression. These include a new average interpolation
subdivision scheme for multiresolution data representation and
reversible wavelet transforms for lossless image compression. A detailed
discussion of these can be found in [1].
Image texture provides a powerful low level image description in the
context of satellite images. Texture could be used to select a large
number of geographically salient features in airphotos, such as
vegetation patterns, parking lots, and building developments. We have
developed a texture thesaurus for aerial photographs that facilitates
fast search and retrieval of image data and this is currently being
integrated into the ADL project testbed. In addition to efficient
access, protecting intellectual property rights is another important
issue in digital libraries research. Our current work on digital
watermarking is presented in Section 3.
The following students working on this project graduated during this
year:
Wei-Ying Ma (Ph.D. June 1997)
Norbert Strobel (Ph.D., March 1998)
Morris Beatty (M.S., August 1998 (expected))
TEXTURE BASED IMAGE RETRIEVAL
In recent years image texture has emerged as an important visual
primitive to search and browse through large collections of similar
looking patterns. An image can be considered as a mosaic of textures and
texture features associated with the regions can be used to index the
image data. For instance, a user browsing an aerial image database may
want to identify all parking lots in the image collection. A parking lot
with cars parked at regular intervalsis an excellent example of a
textured pattern when viewed from a distance, such as in an airphoto.
Similarly, agricultural areas and vegetation patches are other examples
of textures commonly found in aerial and satellite imagery. Examples of
queries that could be supported in this context could include
``Retrieve all Landsat images of Santa Barbara which have less than
20% cloud cover'' or ``Find a vegetation patch that looks like this
region''.
In [2] we have investigated the role of textures in annotating image collections and report on the performance of several state-of-the-art texture analysis algorithms with performance in similarity retrieval being the objective. It is demonstrated that simple statistics computed from Gabor filtered images provide a good feature descriptor for content based search. These texture features are used in developing a texture thesaurus for fast search and retrieval [3],[4]. Salient aspects of this work are:
Image Segmentation
Automated image segmentation is clearly a significant bottleneck in
enhancing the retrieval performance. Although some of the existing
systems have demonstrated a certain capability in extracting regions and
providing a region-based search, their performances on large and diverse
image collections have not been demonstrated. We believe that it is
important to localize the image feature information. Region or object
based search is more natural and intuitive than search using the whole
image information. With automated segmentation as the primary objective,
we have developed a robust segmentation scheme, called EdgeFlow, that
has yielded very promising results on a diverse collection of a few
thousand images [5].
The EdgeFlow scheme utilizes a simple predictive coding model to identify and integrate the change in visual cues such as color and texture at each pixel location. As a result of this computation, a flow vector which points in the direction of the closest image boundary is constructed. This edgeflow is iteratively propagated to its neighbor if the edgeflow of the corresponding neighbor points in a similar direction. The flow stops propagating if the corresponding neighbor has an opposite flow direction. In this case the two image locations have flow vectors pointing at each other indicating the presence of a discontinuity between them.
This EdgeFlow framework results in a dynamic boundary detection scheme. The flow direction gives the direction with the most information change in the image feature space. Since any of the image attributes such as color, texture, or their combination can be used to define the edgeflow, this provides a simple framework for integrating diverse visual cues for boundary detection. Figure 2 shows different stages of the segmentation algorithm on a small region in an aerial photograph. For computational reasons, the image texture is computed in blocks of 64 x 64 pixels. The initial texture flow vectors so computed are shown in Figure 2(b) and after convergence, in Figure 2(c). The final detected boundaries are shown in Figure 2(d).
Learning Similarity
In a typical database search, several top matching patterns are
retrieved for a given query. These matches are rank-ordered based on
their similarity to the query pattern. Ideally, a distance metric in the
texture feature space should preserve the visual pattern similarity.
Computing such a similarity is an important problem in content based
image retrieval.
In order to improve the retrieval performance of the texture image features, we have proposed the use of a learning algorithm [6]. For this, a hybrid neural network algorithm is used to cluster texture patterns in the feature space. This algorithm contains two stages of training. The first stage performs an unsupervised learning using the Kohonen feature map to capture the underlying feature distribution. In the second stage, clusters are labelled using a winner-takes-all representation, and class boundaries are fine tuned using a learning vector quantization scheme. This results in a partitioning of the original feature space into clusters of visually similar patterns based on the class label information provided by human observers. Once the network is trained, the search and retrieval process is performed in the following way: When a query pattern is presented to the system, the network first identifies a subspace of the original feature space which is more likely to contain visually similar patterns. The final retrievals are then computed using a simple Euclidean distance measure with the patterns which belong to the corresponding sub-space. In addition to retrieving perceptually similar patterns, an additional advantage of this clustering approach is that it provides an efficient indexing tree to narrow down the search space. The cluster centers are then used to construct a visual texture image thesaurus, as explained below.
Texture Thesaurus
A texture thesaurus can be visualized as an image counterpart of the
traditional thesaurus for text search. It contains a collection of
codewords which represent visually similar clusters in the feature
space. A subset of the airphotos was used as training data for the
hybrid neural network algorithm described earlier to create the first
level of indexing tree [4]. Within each subspace, a hierarchical vector
quantization technique was used to further partition the space into many
smaller clusters. The centroids of these clusters were used to form the
codewords in the texture thesaurus, and the training image patterns of
these centroids were used as icons to visualize the corresponding
codewords. In the current implementation, the texture thesaurus is
organized as a two-level indexing tree which contains 60 similarity
classes and about 900 codewords. Figure 3 shows some examples of the
visual codewords in the texture thesaurus designed for airphotos.
Associations of the code words can be made to semantic concepts as well.
This is being investigated in a related project [8].
When an airphoto is ingested into the database, texture features are extracted using 64x64 subpatterns. These are then grouped to form regions. These features are used to compare with the codewords in the thesaurus. Once the best match is identified, a two-way link between the image region and the corresponding codeword is created and stored as the image meta-data. During query time, the feature vector of the selected pattern is used to search for the best matching codeword, and by tracing back the links to it, all similar patterns in the database can be retrieved (Figure 4). Some examples of retrievals are shown in Figure 5 and Figure 6. An on-line web demo of this texture based retrieval can be found at http://vivaldi.ece.ucsb.edu/AirPhoto.
Dimensionality Reduction
Image retrieval using the texture thesaurus reduces the search
complexity by providing a tree-structured indexing while preserving the
similarity between patterns.
However, even greater improvements in retrieval efficiency are
desirable. Because the database to be searched is large and the feature
vectors are of high dimension, search complexity is still high.
Promising new results suggest that non-linear PCA may be useful in
reducing the dimension of the feature vectors without destroying too
much of the information they contain. Preliminary experiments show that
non-linear PCA can project 60 dimensional feature vectors to just 6
dimensions, while maintaining very high retrieval rates [7]. Thus an
efficient retrieval system architecture might use neural nets to
initially direct query vectors to subclasses as described above, and
then apply a non-linear projection to a lower dimension before searching
the subclass for the best matches.
The basic idea is to compute a projection map that maps the high dimensional feature vectors to a lower dimensional space subject to certain constraints. For example, the distances in the new space should approximate user provided perceptual distances between pairs of patterns.
In our experiments with this approach to dimensionality reduction, we chose a mapping that reduced the 60 dimensional features to a 6 dimensional vector. A training feature set is used to compute the parameters of the transformation. To evaluate the quality of the projection map, each of the database images that was not used in training was used as a query image. For each query, the 10 closest vectors from the database were retrieved. Ideally, the retrieved vectors should belong to the same image class as the query vector. We found that the average correct retrieval percentage considering only those images not used in training was 87%. For comparison, we also evaluated retrieval performance using the full 60 dimensional feature vectors; we found 90% correct retrieval. Thus a factor of 10 reduction in the dimension did not greatly reduce retrieval performance. For further comparison, a traditional linear PCA was used to project the vectors to 6 dimensions. For this linear projection we found only 28% retrieval. The results are summarized in Figure 7. Details can be found in [7]. Our long term goal is to construct a visual thesaurus for images/video where the thesaurus code-words are created at various levels of visual hierarchy by grouping primitives such as texture, color, shape, and motion. For complex structured patterns, these codewords take the form of a labelled graph, with the nodes in the graph representing primitive image attributes and the links the part relationships. The whole approach is hierarchical and can be extended to a more complex set of image attributes.
Digital Watermarking
Intellectual property protection is another important issue for digital
media content providers and in digital libraries. One approach to this
problem is the use of digital watermarking. In digital watermarking, a
signature is embedded into the original host data and data
authentication can be done by checking for the presence of such
signatures. For images and video, these signatures could be either
visible (as in a transparent background) or invisible. The use of
invisible signatures is of interest as one can distribute the data in
its original form with little, if any, perceptual distortion.
In order to be effective, an invisible watermark should be secure,
reliable, and resistant to common signal processing operations and
intentional attacks. In our work on data embedding using signal
processing techniques, we have focussed on hiding significantly larger
amounts of signature data. This is in contrast to much of the related
work in digital watermarking where the signatures are typically binary
pseudo-random sequences. For example, we can embed signature images
which are as much as 25% of the host image data. Embedding such large
amounts of image and video data opens up other interesting applications
in security and intelligence gathering, and offers an alternative to
traditional encryption methods.
The approach we are currently investigating is based upon well established techniques from channel coding using lattice structures. In this, both the host and signature image data are first wavelet transformed. The signature image coefficients are then quantized to a given number of levels. The host image coefficients are then grouped to form multidimensional vectors which are then perturbed by the channel codes corresponding to signature coefficients. After this embedding in the wavelet domain, the inverse transformation gives the watermarked image. Signature recovery follows by inverting the embedding procedure assuming that the host image is available. Details of this scheme is available in [9] (this proceedings).
Our preliminary experiments demonstrate that this type of embedding is robust to lossy image compression. Figure 8 shows two examples. The Alexandria project symbol is embedded in a aerial photograph. The two examples correspond to signature recovery under lossy wavelet compression and lossy JPEG compression. In general, good quality signature recovery is possible for up to 80
Discussion
Large spatial databases such as satellite imagery and aerial photographs
pose several challenging research problems. We have presented some
promising results on region based retrieval using texture.
Dimensionality reduction is going to be critical for large scale
databases. Representation of spatial and spatio-temporal relationships
is another important research problem. A visual thesaurus provides a
conceptual framework for addressing many of these issues. A
demonstration of an image thesaurus for airphoto browsing can be found
on the web at http://vivaldi.ece.ucsb.edu/AirPhoto. Extensions of this
work to include color and texture for natural photographs is also
available on the web at http://vivaldi.ece.ucsb.edu/Netra.
References
Abstracts of Published Papers
During the past few years several interesting applications of eigenspace representation of images have been proposed. These include face recognition, video coding, and pose estimation. However, the vision research community has largely overlooked parallel developments in signal processing and numerical linear algebra concerning efficient eigenspace updating algorithms. These new developments are significant for two reasons: Adopting them will make some of the current vision algorithms more robust and efficient. More important is the fact that incremental updating of eigenspace representations will open up new and interesting research applications in vision such as active recognition and learning. The main objective of this paper is to put these in perspective and discuss a new updating scheme for low numerical rank matrices that can be shown to be numerically stable and fast. A comparison with a non-adaptive SVD scheme shows that our algorithm achieves similar accuracy levels for image reconstruction and recognition at a significantly lower computational cost. We also illustrate applications to adaptive view selection for 3D object representation from projections.
A texture based image retrieval system for browsing large-scale aerial photographs is pre-sented. The salient components of this system include texture feature extraction, image segmentation and grouping, learning similarity measure, and a texture thesaurus model for fast search and indexing. The texture features are computed by filtering the image with a bank of Gabor filters. This is followed by a texture gradient computation to segment each large airphoto into homogeneous regions. A hybrid neural network algorithm is used to learn the visual similarity by clustering patterns in the feature space. With learning simi-larity, the retrieval performance improves significantly. Finally, a texture image thesaurus is created by combining the learning similarity algorithm with a hierarchical vector quan-tization scheme. This thesaurus facilitates the indexing process while maintaining a good retrieval performance. Experimental results demonstrate the robustness of the overall system in searching over a large collection of airphotos and in selecting a diverse collection of geographic features such as housing developments, parking lots, highways, and air-ports.
We present here an implementation of NETRA, a prototype image retrieval system that uses color, texture, shape and spatial location information in segmented image regions to search and retrieve similar regions from the database. A distinguished aspect of this system is its incorporation of a robust automated image segmentation algorithm that allows object or region based search. Image segmentation significantly improves the quality of image retrieval when images contain multiple complex objects. Images are segmented into homogeneous regions at the time of ingest into the database, and image attributes that rep-resent each of these regions are computed. In addition to image segmentation, other important components of the system include an efficient color representation, and indexing of color, texture, and shape features for fast search and retrieval. This representation allows the user to compose interesting queries such as ``retrieve all images that contain regions that have the color of object A, texture of object B, shape of object C, and lie in the upper one-third of the image'' where the individual objects could be regions belonging to different images. A Java based web implementation of NETRA is available at http://vivaldi.ece.ucsb.edu/Netra.
Currently there are quite a few image retrieval systems that use color and texture as features to search images. However, by using global features these methods retrieve results that often do not make much perceptual sense. It is necessary to constrain the feature extraction within homogeneous regions, so that the relevant information within these regions can be well represented. This paper describes our recent work on developing an image segmentation algorithm which is useful for processing large and diverse collections of image data. A compact color feature representation which is more appropriate for these segmented regions is also proposed. By using the color and texture features and a region-based search, we achieve a very good retrieval performance compared to the entire image based search.
A novel boundary detection scheme based on ``edge flow'' is proposed in this paper. This scheme utilizes a predictive cod-ing model to identify the direction of change in color and texture at each image location at a given scale, and con-structs an edge flow vector. By iteratively propagating the edge flow, the boundaries can be detected at image locations which encounter two opposite directions of flow in the stable state. A user defined image scale is the only significant control parameter that is needed by the algorithm. The scheme facilitates integration of color and texture into a single framework for boundary detection.
Progressive-resolution transmission is of significant practical importance for online image libraries. When combined with reversible image compression, it provides a particularly promising mechanism which not only contributes to lower storage overhead but also to smaller transmission costs. We propose an efficient lossless compression scheme for RGB color images. It consists of a modified reversible subband transformation which is followed by a reversible color decorrelation technique. Switching the (traditional) order of wavelet and spectral transform offers the opportunity to support progressive- resolution transmission of spectrally decorrelated wavelet coefficients without compromising compression performance.
Image decomposition based on the discrete wavelet transform (DWT) has been proposed for efficient storage and progressive transmission of images for visual browsing in digital image libraries. Although the compression aspects of the DWT have been carefully researched, reconstruction errors due to erroneously transmitted wavelet coefficients have received less attention. In this paper we consider the effect of a noisy channel on uniformly quantized wavelet coefficients and propose an error concealment method which, based on a local image model, simultaneously detects and corrects corrupted wavelet coefficients.
We present here an implementation of NETRA, a prototype image retrieval system that uses color, texture, shape and spatial location information in segmented image regions to search and retrieve similar regions from the database. A distinguished aspect of this system is its incorporation of a robust automated image segmentation algorithm that allows object or region based search. Image segmentation significantly improves the quality of image retrieval when images contain multiple complex objects. Images are segmented into homogeneous regions at the time of ingest into the database, and image attributes that rep-resent each of these regions are computed. In addition to image segmentation, other important components of the system include an efficient color representation, and indexing of color, texture, and shape features for fast search and retrieval. This representation allows the user to compose interesting queries such as ``retrieve all images that contain regions that have the color of object A, texture of object B, shape of object C, and lie in the upper one-third of the image'' where the individual objects could be regions belonging to different images.
We present an implementation of a system for content- based search and retrieval of video based on low- level visual features. Currently the system consists of three parts, automatic video partition, feature extraction, video search and retrieval. Three primary features, color, texture and motion are used for indexing. They are represented by color histogram, Gabor texture features, and motion histo gram. Most of the processing is done directly in the MPEG compressed domain. Testing on sports and movie data bases have shown good retrieval performance.
There has been much interest recently in image content based retrieval, with applications to digital libraries and image database accessing. One approach to this problem is to base retrieval from the database upon feature vectors which characterize the image texture. Since feature vectors are often high dimensional, Multi-Dimensional Scaling, or Non-Linear Principal Components Analysis (PCA) may be useful in reducing feature vector size, and therefor computation time. We have investigated a variant of the non-linear PCA algorithm described in [6] and its usefulness in the database retrieval problem. The results are quite impressive; in an experiment using an aerial photo database, feature vector length was reduced by a factor of 10 without significantly reducing retrieval performance.
There is a growing need for new representations of video that allow not only compact storage of data but also content-based functionalities such as search and manipulation of objects. We present here a prototype system, called NeTra-V, that is currently being developed to address some of these content related issues. The system has a two-stage video processing structure: a global feature extraction and clustering stage, and a local feature extraction and object-based representation stage. Key aspects of the system include a new spatio-temporal segmentation and object-tracking scheme, and a hierarchical object-based video representation model. The spatio-temporal segmentation scheme combines the color/texture image segmentation and affine motion estimation techniques. Experimental results show that the proposed approach can handle large motion. The output of the segmentation, the alpha plane as it is referred to in the MPEG-4 terminology, can be used to compute local image properties. This local information forms the low-level content description module in our video representation. Experimental results illustrating spatio-temporal segmentation and tracking are provided.
An approach to embedding gray scale images using a discrete wavelet transform is proposed. The proposed scheme enables using signature images that could be as much as 25 used both in digital watermarking as well as image/data hiding. In digital watermarking the primary concern is the recovery or checking for signature even when the embedded image has been changed by image processing operations. Thus the embedding scheme should be robust to typical operations such as low-pass filtering and lossy compression. In contrast, for data hiding applications it is important that there should not be any visible changes to the host data that is used to transmit a hidden image. In addition, in both data hiding and watermarking, it is desirable that it is difficult or impossible for unauthorized persons to recover the embedded signatures. The proposed scheme provides a simple control parameter that can be tailored to either hiding or watermarking purposes, and is robust to operations such as JPEG compression. Experimental results demonstrate that high quality recovery of the signature data is possible.
This paper addresses 3D shape recovery and motion estimation using a realistic camera model with an aperture and a shutter. The spatial blur and temporal smear effects induced by the camera's finite aperture and shutter speed are used for inferring both the shape and motion of the imaged objects.
A novel local scale controlled piecewise linear diffusion for selective smoothing and edge detection is presented. The diffusion stops at the place and time determined by the minimum reliable local scale and a spatial variant, anisotropic local noise estimate. It shows nisotropic, nonlinear diffusion equation using diffusion coefficients/tensors that continuously depend on the gradient is not necessary to achieve sharp, undistorted, stable edge detection across many scales. The new diffusion is anisotropic and asymmetric only at places it needs to be, i.e., at significant edges. It not only does not diffuse across significant edges, but also enhances edges. It advances geometry-driven diffusion because it is a piecewise linear model rather than a full nonlinear model, thus it is simple to implement and analyze, and avoids the difficulties and problems associated with nonlinear diffusion. It advances local scale control by introducing spatial variant, anisotropic local noise estimation, and local stopping of diffusion. The original local scale control was based on the unrealistic assumption of uniformly distributed noise independent of the image signal. The local noise estimate significantly improves local scale control.
We propose a general framework for computing invariant features from images. The proposed approach is based on a simple concept of basis expansion. It is widely applicable to many popular basis representations, such as wavelets, short-time Fourier transform, and splines.
We describe a data hiding technique which uses noise-resilient channel codes based on multidimensional lattices. A trade-off between the quantity of hidden data and the quality of the watermarked image, is achieved by varying the number of quantization levels for the signature, and a scale factor for data embedding. Experimental results show that the watermarked image is transparent to embedding for large amounts of hidden data, and the quality of the extracted signature is high even when the watermarked image is subjected to upto 75 compression. These results can be combined with a private key-based scheme to make unauthorized retrieval impossible, even with the knowledge of the algoritm.
Creation of digital image and video libraries poses several interesting and challenging problems. New tools are needed for managing such multimedia content. These include methods to search, retrieve, and manipulate digital media by using the media content information and mechanisms to protect intellectual property rights. This paper outlines some of the recent advances in image processing as related to digital libraries in the context of the UCSB Alexandria Digital library project.
The use of singular value decomposition (SVD) values and quadratic Teager filters for image characterization is investigated. SVD values and Teager filter outputs represent different characteristics of the image texture and this property is utilized to efficiently characterize the image. Local energy values of the image texture are found by using 2-D energy filters. Filter outputs are combined with the eigenvalues obtained from SVD of local image portions and they form the feature vector for the given image. The normalized Euclidian distance between feature vectors of different images gives us a similarity measure for these images. Experimental results show that the representation is efficient in terms of storage space and retrieval accuracy.