Outline: Chapter 1
The Nature of the NIDR Challenge
- initial brief definition of terms: networked information discovery and retrieval; network resources (objects); metadata (including comments on etymology).
- scope of problem that is focus of this paper: how to improve ability of user to discover and access resources the current internet-based networked information resource environment. Extent of software mediation in the NIDR process. How a mix of free and for-fee information will change the picture.
- Characterization of key features of the networked information environment relevant to NIDR problems:
- very large scale, rapid growth; dynamic addition and relocation of information resources
- extremely heterogeneous nature of resources
- wide variation in granularity of resources; hierarchical resource organization
- multiple generations of information resources and supporting access systems
- distributed and autonomously managed resources
- wide variation in quality of resource content and implementation
- growth of “self-publishing” models of information distribution
- combination of free & for-fee information resources
- combination of public and private information spaces
- no commitment by information providers to offer service registry within a central framework
- very heterogeneous user base; varying expertise and needs, varying access capabilities
- unrealistic (and poorly articulated) user expectations
- poorly defined user selection requirements
- information overload: too much overall information, and too much relevant information
- a closer look at the discovery process:
- discovery as an iterative research activity; different kinds of discovery.
- discovery as “catalog use”; performed by humans
- components of discovery as a process: selection, collocation, duplicate elimination, ranking/differentiation, browsing, determining “fitness for use”.
- hierarchical searching and granularity; discovering systems/information spaces; knowing where to search
- the continued need for surrogates for objects in discovery on the net; arguments based on limited ability to fetch headers that are object components selectively, performance issues, economic and intellectual property issues (i.e. separate creation, control and distribution of surrogates and primary objects)
- automated support for discovery as a continuing process: SDI, personal agents, filters
- a closer look at the retrieval/access process:
- defined primarily by existing (simple) network retrieval protocols; these put an undue burden on discovery
- parameters of access processes, e.g. costs and formats (static vs. dynamic issues); poor accommodation by current protocols
- multistage, sequential nature of access/retrieval & subsequent use of network information objects.
- low levels of interoperability targeted (moving bits, or application-specific file formats)
- key problems with achieving current NIDR objectives:
- objects as viewed in the NIDR context are extremely simple
- classic information retrieval issues; heavy use of natural language
- lack of data sources (cataloging) upon which to base discovery
- networked information retrieval issues (extended retrieval)
- performance and architecture problems (technical issues) in large scale distributed systems
- incorporation of nontextual objects and their description
- nontechnical issues with major architectural implications: privacy, security, intellectual property, charging for information