next up previous contents
Next: IMAGE PROCESSING TEAM Up: COMPARISON OF ACTUAL Previous: INTERFACE DESIGN AND

INFORMATION SYSTEMS TEAM

IS1
Promotion-Based Indexing
PLANNED ACTIVITY
We have compared the performance of the LIB-structures with the R*-trees on real datasets from the Alexandria Digital Library. We found that the LIB structure is good only for the gazetteer data which has low nesting and not for the catalog data, which is higly nested. This was because the structure maintains multiple indices for each level and hence is not good for highly nested data. Consequently,we proposed the promotional-based index to incorporate the concepts of LIB structures into conventional structures like the R*-trees. We found that the scheme improves the performance of R*-tree for both gazetteer and the catalog data sets. We were unable to incorporate it into the testbed since no current database to date permits user-level index structures, including Illustra. However, we are planning to integrate it as a stand-alone component of the testbed.

In addition, we are investigating the issues in indexing images based on texture, color etc. We are examining how to build an image database system that supports searches on various attributes. In this context, we designed an optimal dynamic index structure for searching in multiple dimensions. Using the concepts of this structure, we plan to develop an efficient image-retrieval system for the Alexandria Digital Library by August 97.

ACTUAL ACTIVITY
We have made the promotion-based index a stand-alone index. However, this index is not integrated with the Alexandria testbed because the current spatial datablade of Informix is performing quite well. Instead, we explored how to reduce the performance bottlenecks due to concurrent operations on a multi-dimensional index. We developed new concurrency protocols that cater to the idiosyncracies of multi-dimensional index structures and achieve better throughput than existing ones.

We then devised new techniques for indexing image attributes such as color, texture etc. We proposed efficient dimensionality-reduction techniques to improve the query performance of such image databases in dynamic environments. In addition to these improvements, we also examined how to reduce the query time complexity of our proposed optimal structure by using data replication.

IS2
Development and Analysis of a Pharos Prototype
PLANNED ACTIVITY
Begin development and analysis of a Pharos prototype.

ACTUAL ACTIVITY
We built a prototype of the automated classification component of Pharos, arguably the most difficult component to implement. We have written up various parts of this work and published a subset of the results.

IS3
Materialized Views
PLANNED ACTIVITY
More and systematic experiments will be conducted with a focus on the scalability and applicability issues of the materialized view approach.

ACTUAL ACTIVITY
Done

PLANNED ACTIVITY
Aid Testbed Team in implementation of the proposed approach in which an adaptive mechanism for creating materialized views dynamically will be developed and integrated into the ADL testbed system to improve query processing.

ACTUAL ACTIVITY
Implemented the proposed materialized views in the ADL web prototype system. A translation module has been built to analyze and translate user queries to the equivalent ones with materialized views. The module was demonstrated during the site visit and the original unaccepted queries due to the unreasonable response time can now be issued. The answers are returned within a few seconds or a few minutes at the most. This supports the possibility of making a fully populated ADL operational.

PLANNED ACTIVITY
For the more theoretical part, the research on the two key issues about materialized view technique (materialized view design and query translation) will be continued.

ACTUAL ACTIVITY
A novel technique to materialized view design has been proposed and published in the WITS'97. An optimal view selection algorithm is proposed for the outer join materialized view case and a near optimal view selection algorithm is proposed for the natural join materialized view case. A near optimal condition was found which guarantees that the selection is within 63% of the ``optimal'' for the natural join case.

PLANNED ACTIVITY
In a more general setting, searching for query optimization methods outside of DBMSs will become increasingly more interesting as database applications come in more varieties and DBMSs become more complex. It is expected that the study on using materialized views to increase query performance will possibly lead to opportunities in query optimization in such context and, for example, efficient ``glue'' for multi-databases. These issues will be examined.

ACTUAL ACTIVITY
A data integration framework is developed for evaluating queries over multi-data sources. The issue of query optimization and the application of materialized view techniques in this context are planned to be studied.

IS4
Data Placement
PLANNED ACTIVITY
The problem of I/O scheduling for tertiary libraries will be studied further with an emphasis on obtaining more general solutions. In particular, the domain will be extended to include online schedules. The study of algorithms for online problems is inherently difficult and heuristic solutions rather than provably optimal solutions will be sought. The evaluation will be done based upon simulations of tertiary storage libraries. Efforts are currently underway to obtain trace data of actual user accesses to the tertiary storage at the San Diego Supercomputing Center. This data is important because at present knowledge about user access of tertiary storage is not available. In the simulations a certain access pattern based upon traditional access patterns for disks is assumed. This model may not be correct for tertiary storage access. Once this information is obtained, the proposed solutions will be studied using this data. These tests will provide a validation of the policies developed.

ACTUAL ACTIVITY
Research on tertiary storage scheduling algorithms was continued however, the direction taken was that of validating the assumptions made in the earlier work rather than looking for online solutions. The team investigated the validity of the assumption made earlier regarding minimizing the number of switches as a technique for improving the schedule.

PLANNED ACTIVITY
Work on prefetching large 2-dimensional data will be continued. Currently, finding an implementation of asynchronous I/O that works satisfactorily has been a major problem. Earlier work done on SUN machines was found to be invalid due to the status of the implementation of asynchronous I/O on SUN machines. Further investigation of the problem was done on SGI machines, but there seem to be problems with the implementation of certain functions under IRIX5.3. Several avenues for evaluating the effectiveness of prefetching are being considered, including, working with raw devices, using non POSIX4 implementations of asynchronous I/O on SUN machines and evaluating the implementation under IRIX 6.2. Prefetching of data blocks when user access patterns follow some connected path over a large 2-dimensional image such as could be expected from edge detection algorithms will be studied.

ACTUAL ACTIVITY
This task was abandoned in favor of the work on declustering of multi-dimensional data on multiple disks to improve the performance of range and similarity queries through parallel I/O. This change was made because of the good results obtained for two-dimensional range queries which showed great promise for other data sets and query patterns as well.

IS5
Extensible Data Store
PLANNED ACTIVITY
The plan for the next year consists of incorporating further services into the Data Store. Additional objects are to be introduced into the Data Store include Active Documents and detailed Map objects for storing the library holdings. The Data Store system will also build a programmable distributed object processing system (agent-like features) on top of the messaging layer; one example of this distributed application is that of processing a query against a federated database system by spawning the query to visit remote databases, processing partial queries at each site, and returning with a final answer. The Data Store must also provide a suitable user-interface for client applications to invoke its services.

ACTUAL ACTIVITY
Work as forecasted in last year's research summary has proceeded as planned. The work has evolved in the following ways:

  1. Refinement of scope of the project, to concentrate on mobility and distributed object model.

  2. Evolution from Data Store to general objectject framework known as StratOSphere.

  3. Work will focus on extensibility of StratOSphere object model, as this appears to be the most interesting area of research.



next up previous contents
Next: IMAGE PROCESSING TEAM Up: COMPARISON OF ACTUAL Previous: INTERFACE DESIGN AND



Terence R. Smith
Tue Jul 21 09:26:42 PDT 1998