The existing library systems lack genuine database support for managing fast growing collection of spatio-temporal and other datasets, which contain information in the form of satellite images and a large collection of digital datasets for elevation, vegetation, geology, soils, meteorology and hydrology. In large part, these data-handling difficulties result from the use of existing file systems as a repository for the datasets. The varied contents, formats, lineage, and the large size of datasets result in an unmanageable collection of files scattered over a network. The absence of database modeling and management further complicates the task of maintaining these datasets, which undergo continued growth and evolution. In addition, efficient access to the contents of these datasets is severely restricted in the current environment. Furthermore, there is no computational support for a spectrum of users for their complicated computational needs.
The digital spatial library will include the following spatial data resources with recognition that some distinctions are only a matter of evolution of the data: digital map images, dumb images of analog maps; digital sensor images, satellite imagery, digital orthophotography; digital raster data, categorized or classified raster data from which content can be readily extracted; digital vector data, attributed points, lines, areas; video, individual or sequences of video frames; software, spatial data or metadata processing software. On the other hand, explicit description of information implicit in datasets, i.e., metadata, should also be an integral part of any catalogue system. Though the distinction between data and metadata is a fine line as one person's data is another's metadata, a digital library must have the minimal function of identifying who has what where. The federal geographic data committee (FGDC) has proposed a draft Content Standard for Spatial Metadata. The draft includes specific elements identified as essential for digital spatial data documentation and distribution. The metadata elements include identification information, projection information, data custodian information, access information, status information, data dictionary, source information, processing steps, a data quality section and a metadata section which describes metadata currentness and contact persons. This set of metadata elements is reasonably comprehensive with respect to digital spatial datasets and provides a starting point.
The metadata is itself a database and similar to the data requires a conceptually and operationally sound model. Current implementations of the draft metadata standard have employed a relational model which has distinct disadvantages for digital library support. The proposed research will identify functional requirements for a metadata model and undertake the fundamental research to structure and organize spatial metadata elements in support of a broad range of user requests.
The metadata model must provide the information to search for and retrieve digital spatial data from distributed and heterogeneous databases, and provide the information to successfully use spatial data which has been retrieved. For the first function the metadata may exist separately from the data it describes. For the second function the metadata should be linked to and travel with the data when it is retrieved. This metadata should include the schema as well as pertinent quality descriptors. This notion corresponds with the FGDC recommendations on a set of mandatory metadata elements for data catalogues and a mandatory set for data transfer.
The metadata model should be capable of extension to accommodate new data types, revised data schemas, additional indexes/key fields, more extensive data descriptions, and multiple query types. As examples, a set of query types are identified below in order of difficulty. In the initial phases of the digital library, metadata will be limited and able to support only simple queries. As map/image segmentation, content identification and indexes evolve and the metadata are enriched, more complex queries can be supported. In the near term much of the information to support the form and process the following queries must remain implicit and extractable through data browsing: (1) document based queries: these queries would request information on maps/images as documents. For example, find all Stanford maps published between 1920 and 1930. Find a series of maps which use graduated point symbols. (2) feature based queries: queries requesting information on well defined geographic features. e.g. find all representations of the Mississippi River from 1700 through 1800. (3) quality based queries: queries requesting information based on quality descriptors including resolution, positional and attribute accuracy, completeness, etc. e. g. Find images with cloud cover less than 20 percent. Find a road network with correct topology and complete traffic control information. (4) form based queries: queries requesting information on geomorphic or anthropomorphic structure (not well defined features). e.g.. Find images with central pivot irrigation systems. Find images containing alluvial fans. (5) process/event based queries: queries requesting information on events of varying duration. Find images of Mississippi River in flood stage. Find a set of frames which document the duration of a phytoplankton bloom.
Our metadata model will include descriptions on the initial source or data collection and compilation methods listed as follows. For each data type, metadata will include descriptions of the physical product as well as the geographic information content.
The construction of spatial metadata descriptions and their evolution can benefit from the functionality of the Standard Generalized Mark Up Language (SGML; ISO/IEC International Standards (IS) 8879-1986). SGML models digital documents as collections of ordered hierarchies of content objects. A recent extension of SGML, HyTime (ISO/IEC Draft International Standard (DIS) 10744) is a proposed standard markup language for representing the structure of multimedia, hypertext, hypermedia, time and space-based documents. ``Markup'' consists mainly of ``start tags and ``end tags'' that respectively precede and follow each logical portion of a document. Through SGML/ HyTime, any document can theoretically package its information content using standard markup. HyTime cognizant software can browse, render, format and query HyTime compliant documents. HyTime does not standardize user interfaces, interactions or query languages. It offers standards for the tasks documents perform i.e referring to portions of themselves or to other documents. The proposed project will investigate the use of HyTime markup to structure the content of spatial images. Development of a standard image and map description language (SIMDL) should make it possible to fully describe the content of any image or map and dynamically link images and maps or their content. The SIMDL markup with its interpretation would provide the essential metadata to browse and query any spatial document.