GEOSPATIAL INFORMATION RESEARCH TEAM

Next: USER INTERFACE EVALUATION Up: RESEARCH ACTIVITIES AND Previous: RESEARCH ACTIVITIES AND

GEOSPATIAL INFORMATION RESEARCH TEAM

Membership: Goodchild (leader), Carver, Geffner, Hill, Kemp, Kothuri, Larsgaard, Smith

Mission Statement of Team: Responsible for investigating a variety of issues relating to the integration of spatially referenced information objects into ADL.

ADL's major objective is to deliver the services of a distributed digital library of spatial data. Many of the problems associated with spatial data also occur in the context of other data types, and the spatial focus of ADL is transparent in many respects. Nevertheless, spatial data is often special, because library services for spatial data are different, or present unique unsolved problems, or because the problems associated with spatial data simply do not occur in other contexts. Spatial data can be defined as data about phenomena that are embedded within a spatial frame, commonly of two or three dimensions. Spatial data are often divided into two conceptual types: fields, which describe the variation of one or more properties over some part of the spatial frame; and discrete objects, which describe the locations and characteristics of primitive elements of given dimensionality. An image of Earth from space or a CAT scan of a human body are examples of fields; examples of discrete objects include the locations and attributes of incidents of crime, or the U.S. road network as a collection of links and nodes.

Geographic data are a significant subset of spatial data, although the terms geographic, spatial, and geospatial are often used interchangeably. Geographic data can be defined as a class of spatial data in which the frame is the surface of the Earth, or the surface and near-surface. ADL's primary motivation comes from geographic data, and the difficulties of providing library services for this particular class. Traditionally, geographic data took the form of paper maps and photographic images, which were accommodated within specialized map libraries, with their own methods of cataloging. A major challenge for ADL is to bring geographic data into the information mainstream, by taking advantage of digital technology to overcome many of the impediments that have forced this traditional separation.

Because of its focus on geographic data, ADL has encountered and addressed several issues that arise in this context. The transition to a digital world has significant impact on many of the conventions associated with geographic data, on arrangements for data production and dissemination, and on the ways in which data are used. During the past year ADL has made significant progress on many of these issues. The next section describes ADL's findings (Goodchild and Proctor 1997) in the area of scale, a concept of particular significance to geographic data but one which presents substantial difficulties in the transition to a digital world. This is followed by a section on fuzziness (Montello et al.), or the problems of dealing with geographic search specifications that are informal, vernacular, or uncertain. The fourth section (Goodchild 1998a) discusses the concept of a geolibrary, or a library whose primary search mechanism is based on geographic location. The fifth section (Goodchild 1998b) examines the interface between a digital library and the expanding field of geocomputation, or the modeling of complex phenomena embedded in geographic space. The final section reviews ADL conclusions (Goodchild 1998c) on the future of libraries as central services serving a dispersed population of users.

Research Activities and Progress over the Past Year

Level of geographic detail
Many geographic phenomena are almost infinitely complex, so any attempt to capture them in geographic data must involve approximation. The level of detail of a data set is an important indicator of its volume, and thus of many practical issues of storage and dissemination. Within ADL, level of detail is a major determinant of the problems the user will encounter in attempting to determine whether a data set meets the user's needs, and of the time it will take to download. Browse images are provided in ADL to help overcome these problems, and ADL's work on wavelet decomposition and progressive transmission is also aimed in this direction.
The traditional measure of level of detail for both paper maps and photographic images is the representative fraction, defined as the ratio between distance on the map or image to distance on the ground. This measure is clearly ill-defined for digital data, and yet has persisted as a legacy of the earlier analog world, supported through a set of increasingly complex conventions. For example, a digital orthophoto quad (DOQ) is said to have a representative fraction of 1:12,000 because its positional accuracy meets the National Map Accuracy Standard for paper maps at that scale. Unfortunately, it does not provide an indication of the level of geographic detail, which is unrelated to the level of detail present on a paper map of that scale.
For the object-based data models of GIS, level of geographic detail is embedded in rules of generalization that were used to define the primitive objects. But such rules are complex and rarely explicit; often they exist only in the mind of the cartographer who drew the map from which the data were derived. Thus it seems difficult to find suitable, well-defined metrics, and we conclude that any ordinal scale is appropriate as a metric of detail, provided it forms the basis of the generalization and specification rules. There seems no reason to preserve representative fraction into the digital world, except as a legacy whose implications will be familiar to some users. However, the notion of extending representative fraction to cover other forms of data, including DOQs and aerial photographs, seems particularly inappropriate.
For the field-based data models, it is much easier to identify appropriate metrics, and we have identified suitable measures that satisfy the stated requirements in all cases except the digitized isoline data model. However, we have identified potential problems for several other data models that use irregular divisions of space. Nevertheless, all of the suggested metrics are measures with linear dimension, and thus fully compatible with the ways in which the general scientific community has traditionally defined scale (and sharply different from the way the cartographic community has defined it, as the representative fraction). Thus the tension that existed in the old culture of paper maps, between cartographers who expressed scale as a dimensionless representative fraction, and scientists who defined level of geographic detail as a linear measure, and found its expression in a disagreement over what was ``large'' is resolved in the new culture. In the digital world, measures of level of geographic detail should always have dimensions of length.
A further advantage of a measure of geographic detail with dimensions of length is that the ratio of the measure of detail to the linear dimensions of the data set's extent is dimensionless. While we have focused largely on the small linear dimension of geographic data in this research, the large linear dimension of extent is also important, particularly when geographic phenomena are observed to have variances that grow with extent apparently without limit. We argue that LOS, and particularly its square, are useful measures of the volume of a geographic data set.
Because the recommended metrics vary depending on the particular field data model used, it will be necessary to allow for this in the design of metadata standards; for example, TIN representations and rasters require different metrics. Metrics should also take account of the processes used to create or transform the data; for example, if a Landsat scene with a 30m pixel size is aggregated to a vector representation with an MMU of 1ha, the appropriate metric of geographic detail is clearly 100m and not 30m.
We have identified several metaphors of geographic detail that may be useful in user interface design, particularly in the control of displays. However, we have been unable to find a close connection between such metaphors and appropriate metrics. What is needed is a metaphor that links clearly with the idea of a linear measure. Perhaps aspects of texture provide the appropriate metaphor, as in ``grain'' or the range of the geostatistician. Perhaps also pixel size is sufficiently well-understood, and the idea of an image as a collection of discrete elements sufficiently linked to the idea of discrete cells in the retina. On the other hand, perhaps it is best to separate the two functions completely, despite their close relationship in the world of paper maps in the new culture of a digital world, the concepts needed to specify level of geographic detail in searching for suitable data may be necessarily different from those needed to control graphic displays.
Finally, we have argued strongly that the transition to a digital world requires a reexamination of the concept of scale, and a new approach that moves away from the conventions and correlates of the world of paper maps. In the digital world, positional accuracy, level of geographic detail, and extent are all important and potentially independent properties of geographic data, and all have dimensions of length; while representative fraction is in many cases meaningless. If chosen appropriately, the standards and measures used in the digital world to assist in processes of search, browse, and assessment of fitness for use can be much more informative and precise than legacies inherited from earlier technology.
Fuzzy footprints
In this paper we have argued that a comprehensive approach to serving digital spatial data from repositories such as ADL requires the ability to deal with query areas that are not crisply defined. The set of geographic placenames and area names in common use extends well beyond the crisply defined set of administrative units, and in many cases terms have several meanings, only some of which are crisp. It is inescapable, therefore, that many user queries will be expressed in terms of geographic areas that do not have formal, precise definitions. Moreover, there will be types of information that have geographic referents, and are thus retrievable using geographic search mechanisms, but for which the corresponding footprints are fuzzy.
Several issues must be dealt with if fuzzy footprints are to be incorporated into digital spatial data search and retrieval mechanisms. We have experimented with three methods of formal representation of fuzzy footprints: a crisp polygon degraded by a simple mathematical function; a radially symmetric function; and a raster representation of a general surface. We have examined appropriate methods for eliciting such representations from users, both before and during the process of search. A number of methods of display have been implemented, in an effort to find methods that are as informative as possible to the user. Finally, we have experimented with metrics of the goodness of fit between fuzzy or crisp representations of the user's area of interest, and fuzzy or crisp representations of an information object's footprint. We have assumed throughout that fuzzy footprints can be described by one of a set of simple models, and there may be instances where none of these models are satisfactory. Our approach assumes that a single model is appropriate, but there will certainly be instances where one group's concept of a fuzzy region differs from that of another group.
The world of distributed information on the Internet is fundamentally different from that of the traditional library. The library culture is highly structured, with formal standards and rules of procedure. But the Internet culture is essentially anarchistic, permitting anyone to serve and retrieve information without any discernable rules governing content. In the traditional map library there were human assistants who could help a user to define a query in a form that was consistent with the rules of organization of the library's information. In the distributed Internet world there are no human assistants to help the user adapt to the system; instead, the system must adapt to the user. Thus approaches such as we have proposed in this paper are an essential part of an Internet-based world, supporting the user's need to be able to pose queries that do not follow officially-recognized ways of describing the world.
The geolibrary
Many types of information refer to specific places on the Earth's surface. They include reports about the environmental status of regions, photographs of landscapes, images of Earth from space, guidebooks to major cities, municipal plans, and even sounds and pieces of music. All of these are examples of information that is georeferenced because it has some form of geographic footprint.
A geolibrary is a library filled with georeferenced information. Information is found and retrieved by matching the area for which information is needed with the footprints of items in the library, and by matching other requirements—but the footprints always provide the primary basis of search. A geolibrary can handle queries like ``What information do you have about this neighborhood?'' ``Do you have a guidebook covering this area?'' ``Can I find any further information about the area in which the Bronte sisters lived?'' or ``What photographs do you have of this area?'' In all of these queries the geographic footprint provides the primary basis of search.
Essentially, it is physically impossible to build a geolibrary, although conventional map libraries come as close as it is possible to come. In a digital world, however, these problems disappear. The user of a digital geolibrary can be presented with a globe; can zoom to the appropriate level of detail; can access lists of placenames and see their footprints; and can move up or down the placename hierarchy using links between places. Moreover, a digital geolibrary solves the problem of physical access, if the services of the library are provided over a universal network like the Internet. And finally, the collection of a geolibrary can be dispersed—a digital geolibrary can consist of a collection of servers, each specializing in materials about their local regions.
In the first instance, a geolibrary could provide an interesting new access mechanism to the contents of a conventional library for example, a tourist planning to visit Paris would have a powerful new way of finding guidebooks. But the full power of the geolibrary lies in its ability to provide access to types of information not normally found in libraries. First, the geolibrary would be a multimedia store. Because everything in a digital library is stored using the same digital media, there are none of the physical problems that traditional libraries have had to face in handling special items like photographs, sound, or video. Second, the geolibrary could focus on building a collection of items of special or local interest. In a networked world there is no need for libraries to duplicate each other's contents - it may be sufficient for one digital copy of the Rouen municipal plan to be available in a digital geolibrary located in Rouen, or perhaps in Paris, provided other geolibraries know it is there. Third, because interest in it tends to be geographically defined, much georeferenced information is unpublished or fugitive. The geolibrary provides an effective mechanism for collecting such information in special, local collections, and making it widely available.
In summary, the contents of a geolibrary would be very different from those of a conventional physical library. They would be dominated by multimedia information of local interest, in fact precisely the kinds of information needed by an informed citizenry, and one that is deeply involved in issues affecting its neighborhood, region, and planet. Because its contents would be different, a geolibrary might attract an entirely new type of library user.
Support for geocomputation
Geocomputation has a large appetite for data. While the literature on cellular automata and artificial life shows that it is possible to build interesting simulations within an undifferentiated spatial frame with virtually no input except for model parameters, geocomputation focuses on modeling processes on geographic landscapes that can be sharply differentiated. Its processes respond to assorted boundary and initial conditions, and these must be represented therefore by input of appropriate geographic data. The parameters of geocomputational models may also be spatially variable, and must be represented with potentially extensive input data. In both of these cases the data serve to differentiate the geographic landscape, and are therefore geographic in the traditional sense, representing the variation of conditions or attributes over geographic space: in general, f(x,y), where xand y are positional variables and f is an attribute. Geocomputation is similarly a heavy producer of data, and requires tools for the analysis and display of voluminous simulation results.
A complex set of arrangements have evolved for production and dissemination of geographic data, and these form the data supply context for much of geocomputation. Recently this system has been revolutionised by the arrival of the Internet and the World Wide Web (WWW), which have removed almost entirely the costs and delays associated with traditional dissemination methods. In this new world data are to be found in widely distributed archives, ranging in size from personal servers built by individuals to make small data sets available to colleagues, to the massive servers maintained by the U.S. Geological Survey's Eros Data Center, or the U.S. National Aeronautics and Space Administration's EOS-DIS (Earth Observing System Data and Information System) for dissemination of vast amounts of Earth imagery and other data. To find data in this loosely coordinated and vastly complex environment the user needing data for a specific purpose must somehow:
1. specify that need in terms whose meaning is widely understood;
2. initiate a systematic process of search;
3. assess the suitability for use of any item identified as potentially useful by the search process;
4. retrieve the data using available communication channels; and
5. open the data for use by a local application.
This new framework differs markedly from its traditional precursor, which relied extensively on individual expertise and assistance. In most cases the potential user of data would have been a spatially-aware professional (SAP) with knowledge of a specialised vocabulary shared with other SAPs. He or she would have interacted with a custodian of data, perhaps at a map library or in a government office, and the telephone number of the custodian or a previous user may have been entirely sufficient to provide the necessary information about the data in question. Data would have been supplied on tape, perhaps by mailing, or in hard copy form to be digitised by the user. Much time would have been spent making the data compatible with the local application, perhaps by reformatting. Much of the available data would have been produced centrally, by a government department funded at public expense, whereas today data are increasingly available from individuals, or local agencies. With central domination of production, it was possible for uniform standards to be imposed; today, a plethora of standards have emerged as a result of marketplace competition and local autonomy.
In short, geocomputation, with its extensive data demands, is arriving as a novel paradigm at a time when many traditional arrangements for production and dissemination of geographic data are breaking down, and are being replaced by a much more flexible, localised, autonomous, and chaotic system that is at the same time much richer, with far more to offer. While new technology has made far more data available, it has also created massive problems in making effective use of its potential. Paradoxically, only the technology itself can provide the basis of solutions. The purpose of this chapter is to examine efforts to deal with these issues, and specifically to provide tools for tackling the five stages identified above.
Libraries as central services
Libraries fall into the category of central facilities serving a dispersed population. In some cases libraries serve the general population; research libraries serve special populations. In the latter case demand is often largely confined to research institutions, where the population requiring the service of a research library is concentrated. Libraries provide access to information, largely in the form of printed text, but also including maps, music, photographs, and other information formats and media. A library's primary function is to place the medium carrying this information in the hands of the user, and this function is supported by ancilliary services such as cataloging which allow users to find the information in the library's collection, and circulation which ensures that the medium returns to the library for use by others. Libraries also provide a host of less obvious functions, sometimes serving as community centers, sites for training in various aspects of information retrieval, and as environments conducive to scholarly research.
A system of central facilities evolves under given patterns of consumer behavior, the technology of transportation, economics of service provision, and consumer demand. When any of these change the system attempts to adjust; facilities are added or deleted, or moved, in response to these changes, as the system works towards a new optimum. In the case of libraries, the transition to digital information handling is in the process of engendering changes in many aspects of the central facilities model, including access (transition from physical access and delivery of media to access through electronic networks and delivery of bits); economies of scale (physical libraries replaced by digital servers); and consumer behavior (consumers have increasing numbers of choices). The purpose of this paper is to explore the implications of these changes for libraries, within the context of central facilities location theory, and thus to anticipate the geographic restructuring that can be expected to occur; in addition, the paper focuses on the complications that result when the information being handled is itself geographic in nature.
The future map of research libraries will look very different from today's. Instead of the classical pattern of central service provision, it will be sufficient for each information-bearing object (IBO) to be available from only a small number of servers; and under perfect connectivity, from only one. A research library will be able to focus on serving only those IBOs that are of particular relevance to its local role. Its responsibility to a geographically-defined constituency argues for it to serve those IGDIs (IBOs of geographically-determined interest) whose footprints overlap its domain, or to provide indirect links to the respective custodians. The library's responsibility to its scholars argues for it to serve the results of their research and their contributions to the corpus of human knowledge; or to provide indirect access to IBOs on each scholar's personal server (though it will likely be argued that the institution is more persistent than the location of the scholar). The library may also serve IBOs that are of particular relevance to the interests of its scholars; or collections of archival IBOs that are analogous to today's special collections. In all other cases, however, the institution will rely not on its own library but on services provided collectively, and paid for collectively. The research library of the digital world will be a much more specialized entity, reflecting the effectively infinite range and zero threshold of library service provision in the digital world.
If such changes in arrangements are implied by changing economics and technology, then one can legitimately ask how the transition will occur. What steps will be needed to ensure that the transition from central to local production is as painless as possible, and similarly for the transition between research library as central facility, and research library as special collection? In one view, such changes in institutional arrangements are impossible to achieve smoothly, and can only occur through invention of entirely new institutions, and abandonment of old ones. Thus the digital library must be built alongside the existing one, but reflecting entirely new principles of organization and responsibility. This strategy leads inevitably to institutional conflict, as new and old arrangements compete for available resources; and the inevitable decline of the old arrangements can be very painful and wasteful. But smooth transition is possible only if there is consensus within the old institutions of the need for change, and a shared vision of how it can be achieved.

Abstracts of published papers

[1]

Goodchild M F 1998a The geolibrary. In S Carver (ed) Innovations in GIS V. London: Taylor and Francis

Many types of information have geographic footprints; the set includes maps and Earth images, and also guidebooks, reports, photographs, and even certain pieces of music. A geolibrary is defined as a library in which geographic footprints provide the primary search mechanism, rather than author, title, or subject. It is argued that a geolibrary cannot exist in the analog world, but is feasible in a digital world. Because its basis of search is different, a geolibrary might be used to access information not commonly found in traditional libraries.

[2]

Goodchild M F 1998b Different data sources and diverse data structures: metadata and other solutions. In P A Longley, S Brooks, W Macmillan, R McDonnell (eds) Geocomputation: A Primer. Cambridge: GeoInformation International

Geocomputation almost by definition requires access to large quantities of geographic data, and it is increasingly common for such data to be supplied using technologies that support search and retrieval over distributed archives, such as the World Wide Web. It is essential therefore that it be possible to define the characteristics of needed data; to search for suitable sources among archives scattered over a potentially vast distributed network; to evaluate the fitness of a given data set for use; and to retrieve it successfully. These stages require the development of an array of tools, and associated standards and protocols. The term metadata is commonly used to refer to languages designed for the description of the contents of a data set, to facilitate its discovery and evaluation by a search engine, as well as its successful transmission and opening by the user's application. In the area of geographic data, the most widely known metadata standards are the Content Standards for Digital Geospatial Metadata, developed and implemented by the U.S. Federal Geographic Data Committee. The chapter reviews the issues raised by the foregoing outline.

[3]

Goodchild M F 1998c Towards a geography of geographic information in a digital world. Computers, Environment and Urban Systems

The theory of central places and facilities location addresses the decisions that must be made when a dispersed demand must be served from a few central sites. Traditional libraries satisfy these conditions, and their locations are therefore amenable to analysis within this theoretical framework. The digital libraries that are rapidly emerging to augment traditional library services are superficially footloose, since the costs of overcoming distance have been reduced effectively to zero by the Internet. But digital libraries must be located somewhere; the criteria associated with location may simply be weaker, or less obvious. Criteria affecting the location of digital data stores are examined. Information of geographically determined interest is defined, and it is shown that geographic information is a subset. The paper concludes that digital libraries will emulate the location patterns of today's special collections, and that the libraries of the future will emphasize serving data of increasingly local interest. Geographic information will be much more prominent in the library of the future.

[4]

Goodchild M F, Proctor J 1997 Scale in a digital geographic world. Geographical and Environmental Modelling 1(1): 5-23

The representative fraction, the metric traditionally used by cartographers to characterize the level of geographic detail in a map, is not well-defined for digital geographic data. Increasingly complex and unsatisfactory conventions are needed to preserve this legacy of earlier technology. A series of requirements is defined for replacement metrics. For digital representations of fields six cases can be identified, but in only two cases is there a straightforward solution to the requirements. For digital representations of discrete objects, representative fraction can be replaced with any ordinal index of specification. We conclude that simple metrics having dimensions of length are preferable to the complex conventions required to specify representative fraction for digital geographic data.

[5]

Montello D R, Goodchild M F, Fohl P, Gottsegen J Implementing fuzzy spatial queries: problem statement and behavioral science methods.

Humans frequently express information about geographic location by using natural language terms, referring to regions and spatial relations in a fundamentally imprecise or 'fuzzy' way (e.g., 'near', 'downtown'). In order to increase the functionality of such spatial information systems as a digital geographic library, it is desirable to design them to interpret queries containing such fuzzy terms. To do this, it is necessary to determine the referents of fuzzy spatial queries and model them in the digital system. In this paper, we discuss traditional and more recent approaches to defining and modeling locational queries. We consider behavioral science methods for determining the referents of fuzzy locational terms and ways these methods could be implemented in a spatial information system. A case study example involving the fuzzy region 'downtown Santa Barbara' is presented, and we outline a prototype system for handling such fuzzy regions.

Next: USER INTERFACE EVALUATION Up: RESEARCH ACTIVITIES AND Previous: RESEARCH ACTIVITIES AND

Terence R. Smith
Tue Jul 21 09:26:42 PDT 1998