THE STORAGE COMPONENT: Hierarchical Storage



next up previous contents
Next: THE NETWORK COMPONENT: Up: PROJECT DESCRIPTION Previous: Research issues

THE STORAGE COMPONENT: Hierarchical Storage

In order to focus upon the most challenging aspects of a digital library for spatially-indexed data, we are taking a conservative approach in the design of the Alexandria storage component, and in particular, we are not proposing to investigate new storage technologies for Alexandria. We are therefore adopting the viewpoint that this component be buildable from "commercial off-the-shelf" hardware and software; and that it scale up as our storage needs grow from gigabytes to terabytes.

These constraints mandate a hierarchical storage architecture: with primary (RAM), secondary (magnetic disks), and tertiary (magnetic tape jukeboxes ) layers. gif Primary and secondary memory, and the movement of data between them, will be managed either by standard operating system and file system interfaces, or directly by database management systems.

Tertiary storage, and the movement of data between tertiary and secondary storage, will initially be managed by commercial hierarchical storage management software such as EpochServe, StorageServer, or UniTree. These systems all present the illusion of tertiary storage as consisting of a single gigantic filesystem, by transparently migrating data between tertiary storage and a secondary storage cache. While logically attractive, this illusion is in fact impossible to maintain owing to the tremendous difference in latencies (milliseconds vs. minutes) between secondary and tertiary storage. To avoid exposing users to the frustration of such wild variations in response times, the storage component must be able to inform other components of the current migration status of requested data, and the other Alexandria components must be able to use this information to re-structure their storage accesses, e.g. by accessing secondary and tertiary storage in parallel, or by "batching" tertiary storage accesses to be performed during non-peak times.

Fortunately, most of the data to be managed by Alexandria has access patterns that can be at least broadly characterized a priori. Any metadata maintained by the storage component (i.e. whatever is not maintained by the catalogue component), and any dataset-specific browse products, will remain on secondary storage. For multiresolution images, the smaller, lower-resolution components will remain on secondary storage, while the larger, higher-resolution components may be migrated to tertiary storage.

It should be noted that the file-based interfaces supported by current hierarchical storage management software do not lend themselves to the kind of explicit placement and migration policies that Alexandria would like to exploit. We will deal with this by working with software vendors to encourage the incorporation of user-specified management policies in their products; and exploring the use of DBMS-managed hierarchical storage, as this facility becomes available in the commercial DBMS products.

In terms of evolution, we expect the Alexandria ``storage pyramid'' to expand most rapidly at the secondary storage level, since this yields the greatest payback in terms of balancing size and access time. Tertiary storage will grow slowly but steadily as our data loading efforts overflow secondary storage.



next up previous contents
Next: THE NETWORK COMPONENT: Up: PROJECT DESCRIPTION Previous: Research issues



Ron Dolin
Wed Dec 7 23:25:02 PST 1994