The major requirements for loading data are disk space, processing power and processing time. In the best cases, data arrive in digital form and in a format that is readily usable by the end user. The ability to maintain data in its original format is optimal for inclusion in the library because it requires no additional processing beyond the move from source media to cataloged object. Unfortunately, most digital data arrive in a proprietary or other non-standard format and require some processing. In these cases, each dataset is evaluated individually in order to determine the best deliverable format. This can be as simple as bundling the dataset and its support file(s) into a single, deliverable object, but more commonly it involves processing the dataset into a more readily usable format prior to bundling with its support files. For any dataset processing, a routine needs to be defined so that all datasets within a data product series are processed identically. All processing must be documented and accounted for in the dataset's metadata. Often, spatial data involve large files and, if processing needs to be done on the data, at least two to three times the original data size in disk space must be available for these processing steps to take place. For data that are non-digital, some type of digitization process, e.g., scanning, tablet digitizing, etc., must take place in order to enable downloading of the data from the library.
Once the data have been processed into usable format(s) and loaded into the library, maintenance is simply a matter of having backups of the data. If magnetic tape is your backup media of choice, the tapes must be exercised yearly and replaced every 5 to 20 years depending on the tape storage environment. The beauty of digital data and media is that they can be replicated without loss or degradation. Other data maintenance issues can involve choice of storage, if desired. For example, if total data size exceeds total available disk space, a hierarchical approach to data storage can be taken. Generally this is implemented by having the most frequently accessed datasets reside on the fastest access storage media, e.g., hard drive. Less frequently accessed datasets can reside off-line on magnetic media or CD-ROM and loaded as needed.