Next: Metadata Loading
Up: CURRENT STATUS OF
Previous: Loading and Maintenance
The Map and Imagery Laboratory (MIL) goal was to load as much data
and its associated metadata as expeditiously as possible; when
metadata was available for hard-copy datasets, that metadata was
loaded. The effort was going well until the resignation in late
August of 1997 of the person responsible for all metadata loading
and a substantial part of the data loading. For various personnel
reasons, the position was not filled until April of 1998, meaning
that almost no metadata or data loading was done between August and
mid-February, when existing staff were pulled away from other
duties and began loading data and metadata in preparation for the
end of March release of ADL to the University of California system
as a whole.
-
- The work of loading data has so far been uncomplicated in
the main by copyright and licensing matters, since the bulk of
base data is generated by the Federal government and is therefore
not copyrighted. The exceptions to this are:
- the satellite imagery of the state of California performed by
SPOT Image Corporation, which is copyrighted, and licensed at full
resolution only for use by University of California (UC) students,
faculty, and staff who are working on UC projects, and
- the Teale Data Center coverage's used by the California
Biodiversity researchers.
As of this report, there were approximately 800,000 metadata records
loaded in the ADL Catalog. Digital data online was ca. 80 gigabytes,
with about 350 gigabytes awaiting processing and loading, and about
150 gigabytes of the latter having been sent to the San Diego
Supercomputer Center (SDSC) for loading in early March.
There have been several problems in dealing with existing metadata
sets:
- Initially, ADL staff attempted to write full ``crosswalks''
from the schema used by a given metadata collection to be ingested,
to the ADL Metadata Schema (AMS). It soon became obvious that a
better way to deal with the matter, at least at this stage of ADL's
development, was to ``crosswalk'' only selected fields, that matched
certain general areas most often needed by users. These general
areas are called ``buckets:'' geographic area in decimal-degree
coordinates; beginning and end date (both date of content and of
publication or issuance); genre keyword (e.g., ``Maps,'' ``Aerial
photographs,'' etc.); digital file format (e.g., JPEG);
controlled-list keywords (thematic; geographic;
chronological/temporal); author, publisher, or other originator;
miscellaneous free text (e.g., summary; sensor name; variant title;
series name); and identifier (e.g., control number; local call
number; URL).
- Another major problem has been quality assurance (Q/A). While
it is straightforward to discover numeric-value fields that have
values outside of the domain of the field, it is difficult or
impossible to use SQL commands in order to find values that -
although inside the domain of the field - are incorrect. Thus, a
longitude coordinate
value of 30 degrees West for Yellowstone National Park is within the
domain of the field (from 0 to 180 degrees), but is incorrect for the
Park. Overall, Q/A has been very consumptive of staff time.
- As time has gone on, ADL has become very aware of the need
to add new fields to the metadata schema, very often graphic fields,
such
as legend, bar scale, and north arrow - fields without which maps
and air photographs are either very difficult or actually impossible
to use. Another field that is badly needed is a field to express
level of accuracy of the coordinates; there is a considerable
difference in accuracy between a coordinate value taken from the
corners of a map sheet, and the coordinate values obtained off a
photomosaic for any one frame on that photomosaic.
- Building metadata for collections, both to speed up retrieval
and to provide users with a general idea of the collections in
ADL, is currently a matter of building XML pages for collections
such as the ADL Catalog and the ADL Gazetteer.
The major problem in dealing with digital data is that so many of
the highest-use materials are not available in digital form, and
ADL does not have the scanner necessary to move them into digital
form. Instead, ADL has concentrated its scanning efforts on aerial
photographs of southern California, and more specifically of
those for Santa Barbara, Ventura, and Los Angeles Counties, of
flights that are heavily used in hard copy. To date, approximately
3,000 of these frames have been scanned; finding corner coordinates
of the frames has required figuring out a method (using
image-processing or geographic-information-system software).
Loading of datasets has been a balance between finding existing
digital datasets with accompanying metadata that are of the kind of
data most frequently used in U.S. map collections, and emphasizing
digital datasets covering California - that is, datasets of most
interest to California users - although any dataset providing
consistent, reasonably error-free coverage of the United States or the
Earth as a whole is of interest.
The following datasets and their associated metadata have been loaded:
- California Biodiversity (PGBIO): GIS (geographic information
system) layers resulting from a project to map the flora and fauna
of the state of California;
- DRG (Digital Raster Graphic): scans of most recent edition
of U.S. Geological Survey topographic quadrangle maps, of the
United States; various scales;
- DOQ (Digital Orthophotoquad), DOQQ (Digital Orthophoto
Quarter Quad): generated from aerial photographs, the DOQs are of
lower resolution than the DOQQs; while the former are distributed
through the U.S. depository system to selected libraries, the
latter must be purchased from the U.S. Geological Survey; the
DOQQs are of the state of California, while the DOQs tend to be
of U.S. areas east of the Rocky Mountains;
- EDAC (Earth Data Analysis Center, New Mexico): about 60
air photos and satellite images over New Mexico;
- Sierra Nevada Ecosystem Project (SNEP): ARC/INFO coverage's,
and imagery for the Sierra Nevada;
- Space Shuttle photographs (SPACE-PIX): a selection of
photographs from hand-held cameras, taken by U.S. astronauts on
the Space Shuttle, these are of various areas all over the Earth's
surface;
- SPOT Image coverage of California: coverage of the entire state
at uniform resolution; images are in 15-minute quadrangles, dating
from the early 1990s; as noted above, this is a copyrighted, licensed
set, for which non-UC users may view only the thumbnails;
- U.S. Central Intelligence Agency Foreign-Country Maps: scanned
versions of 8.5 x 11" maps of non-U.S. countries; scanning
originally performed by the Map Collection of the Perry Castenada
Library, University of Texas, Austin; the hardcopy versions are
very heavily used in U.S. map collections
In progress but not yet loading:
- AVHRR (Advanced Very High Resolution Radiometer): crosswalk
written between AVHRR and AMS;
- DEM (Digital Elevation Models) Selected: elevation data for
the United States, taken from 1:250,000-scale maps, and for
California and Hawaii, taken from 1:24,000-scale maps; all
9-track-tape data has been moved to 8mm cassette; parent record
is available, and child records will be taken from the headers;
- Mojave Desert Project: AVHRR data and metadata already
loaded; MSS (Landsat Multispectral Scanner) and TM (Landsite
Thematic-Mapper) data and metadata completed and ready to load
Next: Metadata Loading
Up: CURRENT STATUS OF
Previous: Loading and Maintenance
Terence R. Smith
Tue Jul 21 09:26:42 PDT 1998