How data might move through the system

Next: Operational ADL at Up: Development of an Previous: Development of an

How data might move through the system

The following figure is a very generalized model of how data would flow into the system and be accessed by Internet users. The contents of each box in Figure 1 will be talked about in the context of very generalized processes so that a series of specifications can be constructed for describing individual archive hardware, software and personnel requirements.

Figure 1: Data Flow

If the library item was originally in analog form, it would have to be scanned or in someway converted to a digital object before starting the process into the library. A very generalized digitizing flow diagram follows where various types of scanner technologies, hooked to a network of computer workstations, and supported by trained staff would provide this step in the digital archiving process.

Figure 2: Operational Library

The standards used in the various processes necessary for end-to-end operations are very important. Areas where standards are required are in communications (e.g., network protocols); data management (e.g., metadata description, data format and storage, database query protocols etc.); production (e.g., scanning, header generation, documentation etc.); and systems (e.g., software, operating systems, etc.). These and others must work in harmony if the total distributed system is to work properly. From the archive perspective, some of these standards and protocols are more important than others and are talked about in more detail below.

ADL, for example, is working on a metadata model that incorporates two major metadata standards, the Federal Geographic Data Committee metadata standard, (FGDC), and the library MARC cataloging standard. These two data description standards have been combined and are currently being used by the ADL testbed. The ADL metadata standard is not complete and must allow for growth and change due to the dynamic nature of spatially indexable data and the continuous emergence of new media.

Standards for scanning, data storage, and creating ADL tools such as the ADL Gazetteer are being researched and will be tested within the UC Prototypes.

Digital storage for the operational library is a complex issue, first because of a single archive's needs for management and control of their own data collections and second due to expanding user expectations. ADL is being viewed as a distributed system; following this philosophy, the physical storage of data most likely will be distributed. Several storage paradigms are available: first, data storage for any archive node may be totally local, totally remote (e.g., using mass storage at ADL or a super-computer site), or a combination of both. Second, data storage may be shared across nodes (e.g., parts of a dataset may reside at several sites but when combined, all segments are complementary), or third, datasets might reside at several nodes. The third option might be employed because of network considerations, or to offer dataset redundancy.

Storage in an operational environment must provide at least two services; the first is to the archivist populating the store; the second is to the user who seeks access to the data. Each service has its own requirements: the archivist might have to preprocess the data as part of the ingest function, e.g., do a wavelet transform or compress the data. The user seeks fast access. ADL is conducting several research sub-projects centered on data delivery. These include progressive delivery of a digital file using wavelet technology; providing sub-setting of a dataset so that only a small user-defined area is retrieved from the larger dataset; providing multiple representations of the same dataset at different resolutions, e.g., the complete file might be 1 gigabyte in size so several other reduced resolution files of smaller size would be made available for downloading. These continue to be active research areas and the ideal solution may not be available to the general user for some time. Until a comprehensive solution is implemented, FTP and mail may be the fastest method of retrieving very large digital files.

Several staffing models are being explored depending on archive resources and level of commitment. Since several new services will be required to operate the library, new or reassigned staff are being committed for scanning, systems management, and user services e.g., applications required to integrate archive data into desktop software, provide digital file conversion between archive storage areas etc. An alternative might be the centralization of certain processes such as scanning, dataset loading, cataloging maintenance, etc. Each alternative will be reviewed in detail while paying special attention to such things as intra-campus network speeds and data security.

Training of archive personnel would be done by several factions; hardware including networking, software for working with ADL, and applications software. Each training group would be responsible for development of their own programs but would be coordinated by ADL management staff.

In the sprit of cooperation, examples of datasets thought to be important to various U.C. library's follow.

UC Berkeley
- Topographic survey [of the coasts of the United States] / U.S. Coast & Geodetic Survey. Scale 1:5,000-1:80,000. 1851? Map Room G3700 svar.U5 Case B Library has: 256 sheets, Reports T-1825-7, T-3653 NOTE: Scan Bay area 1:10,000 sheets. First geodetic survey of the coastline.
- Hammon, Jensen & Wallen. Topographic map series University of California: Berkeley California. Scale [1:480]. 1" = 40'. Oakland: Hamman, Jensen, Wallen & Associates, 1994. Map Room G4364.B5:2U5 1994.H3 Case B Library has: 61 sheets + index map NOTE: ALREADY HAVE DIGITAL FILES
- Pacific Aerial Surveys. [Aerial photography, Berkeley campus, 1994]. Scale [ca. 1:600] Oakland, CA: Hammon/Jensen/Wallen & Associates, 1994. Map Room G4364.B5:2U5A4 1994.P3 Case B Library has: 22 col. photos. NO copyright.
- Non-Copyrighted Bay area cities. ie. pre-1946 copyright expired, maps, esp. Oakland, Berkeley, San Francisco Geologic & fault maps of the Bay area, USGS, CDMG, Dibblee quadrangles, Open-file maps. CDMG has prototype Arcview coverage for 1:750,000 geologic map of the state already.
UCLA
- California Forest and Range Experiment Station. Vegetation type maps of California and western Nevada. Prepared by Forest Survey Staff, A. E. Wieslander in charge ... in cooperation with the University of California. [Washington, D.C., 1932-38].
- Geological Survey (U.S.) California (Los Angeles County) ... quadrangle. Washington, D.C. : U.S. Dept. of the Interior, Geological Survey, [19--?- Los Angeles City. Bureau of Engineering. Street Opening & Widening Division. Topographic Map sets of Santa Monica Mountains, Sunland-Tujunga-Verdugo Mountains, North-East Los Angeles, Sylmar-Granada Hills, Chatsworth Reservoir-Canoga-Park-Knapp Ranch, Baldwin Hills-Westchester- Playa Del Rey, San Pedro, LA Freeway Downtown Loop, Central LA, and Benedict Canyon. Los Angeles: Bureau of Engineering, 1959-75.
- Los Angeles (Calif.). Police Dept. Area boundaries of the Los Angeles Police Department [map]. Los Angeles, CA: Los Angeles Police Department, Cartography and Visual Aids Unit, [1992-96].
- Nature Conservancy (U.S.) Color infra-red aerial photos of Santa Cruz Island. Santa Barbara, CA: Pacific Western Aerial Surveys, 1985.
- Pafford & Associates. Topography, University of California Los Angeles. [Los Angeles, Calif.]: Pafford & Associates, [1995]
Stanford University
- Stanford University Libraries' Map Collections at Branner Earth Sciences Library hope to develop a study/research level collection in earth sciences and for social science program support.
- Recently a Biology student used a USGS Digital Elevation Model, transferred from a USGS server by FTP, with Idrisi to create a perspective image of a Rocky Mountains region in order to examine high-altitude butterfly habitats. Researchers in Geophysics and other Earth Sciences use 3-D images in oil basin and reservoir analysis and structural geology, and the library can play a support role here.
- Census Mapping and Population Studies ; Important to these areas are U.S. Census TIGER/Line files. One medical researcher has used these map files to compare the incidences of cancer in the Central California Valley counties from 1980 to 1990. ATLAS/GIS was used in combination with the TIGER files. Additionally, Law School students are analyzing voter districts in Los Angeles County using the TIGER mapping.
- Global Map Coverage: The Electrical Engineering Department's academic courses in communication are doing topographic analysis of large regions, using the Digital Chart of the World and digital atlases
UCSB: See loading plan.
- Current Metadata records in the ADL database are about 760,000 in number.
- Data Files available from ADL consist of about a third of a terabyte.

Next: Operational ADL at Up: Development of an Previous: Development of an

Terence R. Smith
Tue Jul 21 09:26:42 PDT 1998