Insight Storage and Retrieval in a Computer-supported Environment

- / -

Prepared in relation to the UNEP-HEMIS design proposal on environment/development information.

1.0 Background to comments

The comments in this note derive from consideration of several long-term concerns explored in connection with programs of the Union of International Associations. These are described in Annex I under the headings:

1.1 Maintenance of databases on international organizations

1.2 Maintenance of databases on "world problems" and "human potential".

1.3 Conceptual and terminological analysis.

1.4 Visualization of relationships.

1.5 Metaphors as vehicles of transdisciplinarity.

1.6 Institutional information systems.

1.7 Translation.

2. Current concerns

In the light of the above explorations, the following general concerns emerge:

2.1 Dissociation of terminology from concept handling: Because of the necessary biases of the library sciences, and their reflection in information system design, some fundamental distinctions are not adequately preserved. The point is perhaps best made with data records on "world problems":

(a) Many problems are only fuzzily defined. There is no standard terminology. Some require one (and usually more) strings of terms to capture the range of words used to name them. New strings may have to be added from time to time to reflect current intellectual fashions. Some may have to be deleted as sub-problems or variants are separated out of that initially identified.

(b) Problems are of course named with different strings of words in other languages.

(c) Computers are least efficiently used when records do not have a unique identifier. There is merit in separating the concept identifier from the names or descriptors which may from time to time be attached to it. Concept management may thus be kept distinct from data management.

(d) The challenge of classifying and re-classifying the set of "world problems" is a continuing one. There is not necessarily any final solution. There may be a range of temporary solutions. Classification exercises should be kept separate from concept management and data management.

2.2 Insight capture. Clearly there is advantage in distinguishing the problem of data capture, handling and retrieval from information handling and retrieval. But much is assumed about the quality and value of "information", when already there is a problem of information overload. From the mass of information, there is a need to distinguish:

(a) Insights: A greater stress is needed on what might be called "insight capture", which has much to do with the question of tools to facilitate comprehension. Unless the user is able to learn while using, he will unable to respond to information beyond the "mechanical" response to the question asked. Here I have been interested in the potential of metaphor and leitbild.

(b) Perceptions: In dealing with "world problems", "acid rain" is an interesting example, the distortions to which information is subject to serve various vested interests (including protecting the reputations of eminent scientists) need to be borne in mind.

(c) Concepts: With the increasing constraints on resources and learning time, there is merit in distinguishing key concepts from the mass of other forms of information. The question might be asked for each discipline what are the contents of concept sets relevant to different levels of competence -- given that a key individual in an institution may only be able to master a level below that which is considered desirable by relevant professions (which may be less than disinterested in defining what is desirable).

2.3 "Harmonization". There are many hidden problems in the stress placed upon the term "harmonization". There is a long track record of clever ways and reasons by which harmonization has been avoided. The challenge faced by ACCIS in dealing with over 300 UN databases would make a valuable case study, if the political background could be revealed. Some are political, some are to do with psychological resistance to other peoples empire-building tendencies, etc. Unless these issues are clarified, any future proposal will be vulnerable -- just as the ACCIS initiatives have always been vulnerable and undermined.

2.4. Potential of new technologies. Implementation of software techniques to enable the user to navigate through the system whilst retaining an understanding of the whole. There are impressive new developments from Xerox PARC -- beyond "windows" to "walls" and "rooms". Virtual reality may also make a great difference in a very short time, especially with the advantages for CD-ROM. However it is very clear that investments are made in response to market needs, and the need for non-specialized access is not considered a priority. It would be easy to argue that investments are made in response to quantity needs and not to quality needs. They are made in such a way as not to threaten the positions of experts seeking to protect their domains. Experience suggests that the more international, interdisciplinary, intersectoral, or sensitive-beyond-established-boundaries, a project is defined to be, the less probable the funding (especially on a long-term basis).

2.5 Priorities. There is much hype about the information society and the global village. Many still aspire to a "global brain", or claim that it already exists. The issue in any new project, constrained by resources, is how to ensure innovative results for modest investment. An interesting example in relation to the environment is the creation of a database on all species (whether plant or animal) and their relationships in order to track vulnerability to pollution, risks to food webs, endangered species, etc. Given the number of species, an easy response is that such a project is not feasible. However valuable approximations to the desired result may be achieved by using higher level clusters of species (classes, families, etc), only extending into specific species where resources and interest justify the work. No such database appears to exist, although clearly it may be made as simple as resources require whilst still retaining an overview -- however lacking in detail. Such a project is an interesting challenge of CD-ROM and for data visualization, including virtual reality representations of food webs with threatening factors.


3.1 Mission statement: As noted above, it is important to be aware of the factors opposing harmonization, especially those that cannot be printed in widely circulated reports. There is also a questionable assumption that policy-making bodies are anxious to receive harmonized information. Policy-making bodies in fact are usually relieved to be able to take advantage of lack of harmonization to pursue policies that might otherwise be more readily challenged (cf "acid rain"). Is it certain that "sound management" is the real desire of policy-making bodies? There is merit in reflecting what happened to previous ambitious projects of this nature operating across institutional boundaries within the intergovernmental system. Briefly the techniques used were: failure to implement, implementation under a watered-down mandate, redefinition to a narrower mandate, reduction of budgets, appointment of incompetent personnel, restriction on exchange of data, effective opposition to any harmonization (implementation of different bibliographic standards, incompatible computer systems, etc). ACCIS and its predecessors (IOB, TABS) merit study in that light. UNISIST and UNBIS also merit reflection. Why is there no UN system-wide documentation system, including the Specialized Agencies? Why has it proved impossible to extend this to other intergovernmental bodies?

3.2 Scope of data: It is always possible to demonstrate that "no other group holds data on..." by suitably specifying criteria. There remains the question of to what extent HEMIS starts incorporating environmental data from domains of other institutions (eg health, education, labour, etc) -- despite the views of those bodies.

3.3 Standard pleas: It is important to be aware of the tendency of international meetings to make standard pleas for further action, without the action in fact being necessarily wanted (eg who would say no to a bibliography, a newsletter, or another meeting?). A distinction should be made between:

(a) what users say they want

(b) what users will really use on a regular basis

(c) what users want but are unable to name (but usually find themselves obliged to pay expensive consultants or intelligence services to locate)

3.4 Track records of cooperation: It is appropriate to ask for some sort of objective study of the track record of cooperation in relation to documentation between international institutions. It is easy to claim that much has been achieved. It is not so easy to uncover what was not done and why. It is important to appreciate the level of intellectual and institutional investment in existing filing and classification systems. New projects cannot hope to claim that they are "neutral" and aimed solely at facilitating access to others. They will necessarily be perceived as a threat by others -- if only in competing for scarce resources. Much tokenism is exhibited in disguising this reality.

3.5 Fashionable topics: The international community is constantly exposed to new fashionable topics. But every decade or so there is a major "paradigm" shift of which "development", "energy" and "environment" are examples. These place institutions under severe political pressure to totally redesign their information systems to accommodate to a new pattern of subject and institutional linkages. At this point in time, "environment" offers a clear pattern and set of priorities. The question is when the next policy "surprise" emerges, will an environmental information system be organized in a sufficiently flexible manner to be reconfigured to handle the new set of priorities which Member States are liable to require?

3.6 Design considerations: There is great advantage in adopting a modular approach to design. This obviously facilitates maintenance and future developments (in response to revised requirements). But it also allows progress to be sensitive to the availability of resources.

3.7 Multi-lingual access: The issue here, as noted above, is the relation between a word, a term, a concept, and the various kinds of data identifier at the computer level (record keys, etc). Ideally there would be a fundamental distinction between the computer key problem, the conceptual problem, and the terminological problem (in whatever language).

3.8 Keyword vs Fulltext: Given the potential for information overload with many existing systems, surely emphasis needs to be placed on levels of insight or priority, even if these are user defined. Fulltext "information" will not necessarily offer insight -- which may call for artificial intelligence to separate out the dross. A mining metaphor is appropriate: it takes a lot of ore to extract relatively little mineral, which then has to be processed to render it into a useful form. Possible attention should be given to the sequence: (a) concept, (b) keyword(s), (c) concepts network, (d) bibliographic information, (e) abstract, (f) fulltext. The more focus is placed on (e) or (f), the less investment is devoted to improving the quality of (a) through (c). It could be argued that users benefit most in terms of creativity from manipulating the system to discover more interesting questions. This requires facilities to handle the (a) through (c) items. Items (e) and (f) can always be found in response to even the most inappropriate questions. To what extent does HEMIS have a responsibility for encouraging users to ask more interesting questions?

3.9 Windows: One might question why such a hardware-demanding platform is required if the widest access is sought. In many ways Windows is a ploy to increase the unit costs of hardware, increasing the unavailability of such facilities to a wider user group in a time of global budget-cutting.

3.10 Closer cooperation: It is worth questioning the assumption as to whether a new tool leads to closer cooperation. Consider older projects such as UNISIST and UNBIS (as noted above).

3.11 Utilization: It is worth questioning whether the kinds of retrieval system that tend to be designed do not facilitate responses that are less than useful -- thus leading to underuse of the system, and to pathetic attempts to boost user statistics to justify an inappropriate investment.

3.12 Graphic interfaces: It is readily assumed that iconic approaches designed in the West are appreciated in other cultures. However there was early concern that even at the level of control knobs or buttons on tape-recorders, there were distinct cultural preferences to which manufacturers had not been sensitive.

3.13 Document access priority: One approach to prioritizing, if required, might focus on navigation of the references, without giving access to documents.

3.14 IRS: Is it necessary to use this well-known acronym which can only breed confusion, especially in the USA.

3.15 Hyper-links: There is a need to distinguish between the facility of clicking through a hyper-link pathway and that of acquiring some overview of where one is in a system of such pathways. Current techniques reinforce the rat-in-the-maze approach, because mapping pathways using graphical techniques is avoided (see references to Mapping Hypertext, for example). Contextuality is a requirement. Knowing that one is in the "water pollution" hyper-text maze is less helpful than seeing where one is in that maze.

3.16 Thesauri: Several issues here:

(a) "Every keyword is related to one unique identifier".

-- This would seem to imply that concept and word are unambiguously tied. This may indeed be a valid assumption in the case of the HEMIS subject matter. "Mercury" is always mercury (except when it is another planet). This is less clearly the case of "health", which may be understood in a variety of ways. Even less so in the case of value-loaded terms such as "quality of life" or "well-being". In addition to the points made above, I refer you to the study by Huff on homonyms and homographs. It is not clear (Fig 10) whether field (F) resolves all these issues.

-- What happens in the case of keywords that can only be rendered unambiguous by context: mathematical "calculus" vs urinary "calculus"?

-- Is the track record such that it is possible to believe in harmonization in relation to keywords? The UN record on defining "aggression" is an interesting example. "Peace" might be another. "Transnational corporations" is another interesting one. What is of interest are the distinct concepts of aggression or peace, not that they all happen to use the same inadequate keyword.

-- With the increasing interest in the cognitive function of metaphors, to what extent would the system be able to handle data using keywords metaphorically? Is HEMIS supposed to be a metaphor-free system? Is this handled by provision for synonyms, even though the synonym may then point beyond the same subject field to make a metaphoric point?

(b) "This thesaurus is based on the INFOTERRA definitions". Here there are several problems. INFOTERRA is a governmental system and therefore subject to pressure from Member States to include or exclude terms which are politically embarrassing. Thesauri can be studied for such bias, recognizing that classification is a political act. It is unclear to what extent INFOTERRA has allowed itself to be sensitive to non-UNEP issues -- bearing in mind an early briefing from UNEP that any communication that did not correspond to the mandates of one of its 12 departments could not be processed. Tieing a classification system to the changing political matrix of institutional mandates as to what constitutes "environment" (or what topics rfelate to what departments) is intellectually embarrassing as the early history of UNEP reveals.

(c) What is the conceptual significance of someone disagreeing with the thesaurus structure? And what can who do about it and when? Some issues are directly to do with conflicting classifications -- as can readily be seen with competing taxonomic classification systems.

(d) "The HEMIS thesaurus may be enlarged by partner institutes in deeper hierarchy levels". What is the OECD Macrothesaurus experience in this regard?

3.11 Retrieval: In relation to our own system, there seems to be a dangerous reliance on descriptors when an approach more tolerant of fuzziness is required.

3.12 Editing thesaurus: By making this a centralized responsibility, this creates a situation from which key thesauri have suffered -- especially when they become heavily institutionalized with people with a heavy intellectual investment in one philosophy as opposed to another. An alternative is to allow users to make proposed (coded) modifications -- with approval (coded) following as and when the central office can get around to it. This allows users to encode new search links in a decentralized mode, making the system immediately relevant. Users without such privileges can choose to follow their own philosophy of proposed extensions -- or ignore them.

3.13 Disagreement: Advances in knowledge come in part from the development of alternative views. These usually have implications for the way information is classified. The question is then how is a system going to hold the two (or more) competing classification variants during the transition phase. Or is the classification going to resist the emerging perspective until the replacement of the old becomes authorized -- but by whom? The pattern of relationships between data elements could usefully reflect an extension of the citation analysis approach, namely to show that Document C is critical of Document F and supportive of Document G.

3.14 Obsolete documents: One of the problems in information systems is the accumulation of items which are obsolete or discredited. How are these to be distinguished by the system -- especially since they are a basis for prioritizing? Using the date is totally inappropriate, since older material may be valuable either for data or as an original insight. The responsibility could be left to the user, but this imposes a heavy scanning burden (as well as increasing costs) for lack of a sensible technical solution. An alternative is to build on citation analysis techniques, recognizing that it is often the uncited publications which herald the future, not those which are part of an intellectual fashion.

3.15 Interdisciplinarity: Despite being concerned with one of the most interdisciplinary subject areas, the HEMIS approach seems to have no special provision for the challenging issue of interdisciplinary searches and conceptual integration. Is this question assumed to be irrelevant because searches can be made using any range of keywords or using "interdisciplin*" ? Should such a system not be doing more to encourage searching in terms of the relevant categories around any specialized search. This is not a hierarchical notion, but is this reflected in the network solution? How is integration reinforced, and to what extent are such issues the responsibility of HEMIS rather than of the user?

3.16 Security classification: No intergovernmental system can be created without making provision for security classifications. This does not seem to be mentioned. There is also the issue of how a thesaurus would be extended to cover classified topics. One of the issues with INFOTERRA was restricted access excluding "nongovernmental" participation -- and thus presumably excluding new topics which such bodies were exploring before they could be considered worthy topics by UNEP.

My conclusion is that in the main the problems are not technical especially if HEMIS is designed to deal only with information items which are unambiguously defined (namely excluding all social or politically loaded factors) and are more focused on information (in response to pre-defined questions) rather than "insight" (in response to interactive questioning), and issues of how information is to communicated and comprehended. There are some very elegant technical solutions, especially those which allow the end-used to approach the data through whatever classification system he chooses, even a personalized one. What is required at this stage is a sense of what tends to happen to interesting information system proposals in the light of past experience -- and who would be motivated to discuss it, able to do so, and in a useful way without covering up the sad realities.

Annex I


The comments in the accompanying note derive from consideration of several long-term concerns explored in connection with programs of the Union of International Associations. The references here are to publications and reports produced by the UIA.

1.1 Maintenance of databases on international organizations

This is a long-term project, originating in 1910, which gives rise periodically to production of hardcopy versions. CD-ROM versions are scheduled for 1993.

1.2 Maintenance of databases on "world problems" and "human potential". This project, long-term started in 1972, gives rise periodically to an:

1.3 Conceptual and terminological analysis. This work was initially undertaken with the Committee on Conceptual and Terminological Analysis (COCTA) and resulted in a report:

Further work was undertaken in discovering ways to deal with the fuzzy concepts underlying "world problems" and "human values". This work took its main form in the design of databases and software to handle the above publications. However a report was presented to a UNESCO-sponsored Conference on Conceptual and Terminological Analysis (Bielefeld, 1981):

Related work was undertaken in connection with the United Nations University project on Goals, Processes and Indicators of Development (1978-1982), notably:

Further work was undertaken in relation to socio-cultural aspects of Information Overload and Information Underuse, a project of the United Nations University:

This work continues in connection with the Encyclopedia of World Problems and Human Potential because of the level of conceptual ambiguity in information handling.

1.4 Institutional information systems. In parallel with the above initiatives (during the various struggles of IOB from which ACCIS finally emerged), reports were prepared for two UN-sponsored conferences on the information challenges facing the intergovernmental organizations and especially the Specialized Agencies:

1.5 Visualization of relationships. There is a long-term concern that current means of presenting information in text form do not highlight the importance of relationships between data elements. Various approaches have been explored to the challenge of representing complex networks of entities in a comprehensible way using computer graphics techniques. These have been summarized in the following report:

1.6 Metaphors as vehicles of transdisciplinarity. Aspects of the above work on concepts are now dealt with in a long-term project on the use of metaphors to handle higher orders of complexity. Here the concern is with the unexplored cognitive potential of metaphors to provide conceptual scaffolding as a guide to transdisciplinary initiatives and problems of governance. Reports include:

1.7 Translation. Although most information is held in English, non-English organization names and keywords are also used for access and indexing. Investigations into the possibility of machine translation from English into other languages are being undertaken, notably for the Yearbook of International Organizations (which last appeared in French in 1980). These distinguish three levels:


Anthony Judge. Knowledge representation in a computer-supported environment. 1977 [text]

