Our purpose is to investigate how parallel and high performance computation can be incorporated into the various components of a digital library to make the system run efficiently. These include the study of storage and retrieval of large data sets, data conversion techniques, optimization of individual database queries, computations on heterogeneous computing environments, parallel I/O, and routing.
The Ingest component of the Alexandria system, for example, will benefit greatly from parallel processing. Wavelets are being proposed as an attractive representation of images that will facilitate efficient storage and retrieval of digitized data. There are several important issues to address here, such as the design of appropriate data structures for their representation, algorithms for preprocessing of raw data into wavelets, decomposition/combination of wavelets, and image compression of wavelet-decomposed images. In the Application/User Interface, we see parallel computing support for performing registration operations, browsing, query operations, fusion and filtering, pattern recognition, various user-requested operations, and development of parallel algorithms for application problems, particularly those that arise in GIS/EOS research. In the Catalogue component, query processing involves the execution of various data operations and the searching of multiple data files. Parallel processing can significantly improve the response time for each individual query. The parallel processing of image/text information retrieval requires a careful design of the Storage component to avoid I/O bottleneck for parallel accessing.
We plan to experimentally validate our developments and algorithms on real-world problems and data sets by running simulation programs on UCSB-accessible parallel computers, including the new 64-node Meiko CS-2 supercomputer the CS Department will be acquiring in March 1994 through an NSF CISE Institutional Infrastructure Grant.