Anti-Developmental Biases in Thesaurus Design
- / -
Paper for the Conference on Conceptual and Terminological Analysis in the Social Sciences (Bielefeld, 1981) sponsored by Committee on Conceptual and Terminological Analysis (COCTA) and International Federation for Documentation (FID). A somewhat abridged version appears in Fred W. Riggs (Ed.): The CONTA Conference: Proceedings of the Conference on Conceptual and Terminological Analysis in the Social Sciences. Frankfurt/Main, INDEKS Verlag, 1982, pp. 185-201 [PDF version]
Bias 1: Static Bias Associated with Noun Categories
Bias 2: Low-Context Bias Associated with Western Science
Bias 3: Pattern Conservation Bias
Bias 4: Dysfunctional Bias
Bias 5: Insensitivity to Thesaurus implications
Bias 6: Avoidance of Top-of-Hierarchy issues
Bias 7: Preference for Adaptive "Maintenance" Thesauri
Bias 8: Investment in Rigid, Anti-Experimental Systems
Bias 9: Depersonalized Portrayal of Thesauri
Bias 10: Concealment of Contradictions
Bias 11: Concealment of Values
Bias 12: Preference for Simplistic List Structures
Bias 13: Exclusion of Uncodeable Dimensions
Bias 14: Mechanistic Concept of Thesaurus integration
A number of biases seem to manifest themselves frequently in the process of thesaurus construction. These biases are inherently anti-developmental, introduce distortions into the design process and constitute obstacles to social development. The effect of these biases is particularly serious in the social science domain.
This is a preliminary investigation, intended to open up discussion. The supporting arguments and evidence are not presented here, although they form part of the argument of several earlier papers (1,2,3,4,5).
Most thesauri are concerned solely with ordering nouns or objects (called 'subjects'). The position of the noun in the schemes may be affected by an adjectival qualifier, but the emphasis remains nevertheless on bounded objects (even if they are abstract). The result of the use of any such Thesaurus is therefore necessarily static - a static assemblage of nouns.
This point is argued by Burger in a paper for this conference (6) a theoretical physicist David Bohm has explored the same question from 1 own perspective:
Can thesauri, using essentially static categories, adequately order information relevant to development? Clearly static schemes have a role to play but given the essential challenge of the development dynamic, are not other types of thesauri also required to safeguard and highlight this dynamism?
Should not complementary thesauri be designed using verbs (processes) as categories. and emphasizing the relationship between such processess? it is not sufficient to argue that the noun "development" implies a "development process" or "to develop". A noun signifies a process deprived of its essential dynamic.
Thesauri using "to develop" as a term would raise many interesting questions, which the static development" at present obscures. The widespread failure of "development" and the prevalence of "maldevelopment" may at least partly be due to a failure to order information in a more dynamic mode.
Given the striking contribution of the Western world to the ordering of knowledge in recent centuries, it is easy to forget that there are other approaches to ordering reality which have been favoured in the past, continue to have their advocates, and which may be significant for the future. The scientific method has even been considered a by-product of the indo-European language group. But there are other language groups, especially in the highly populated developing countries. it is too easily assumed that a thesaurus, meaningfully structured to the Western mind, in an adequate vehicle for knowledge for users in non-Western cultures for whom other dimensions may be of greater significance.
David Bohm clarifies the subtle anti-developmental consequences of this bias an follows:
The effective imposition of indo-European classification schemes on other cultures may therefore not only do violence to their cultural perspectives but may also obstruct both communication to those cultures (knowledge transfer) and communication from them (of the high-context variety rare in Western knowledge systems).
How can high and low-context dimensions be blended together in a development-oriented thesaurus?
Thesauri are usually designed with "spare positions" at which new sub-categories can be added. The healthy "evolution" of the thesaurus over time is then seen as the gradual filling up of these positions. Serious problems arise when there are no longer any spare positions available for new subcategories. Various "fudging" techniques are then used to "squeeze them in".
At no point is there any question of changing the fundamental pattern around which the thesaurus is built. Thesauri are assumed to grow by extension of a pre-defined pattern and not by transformation of that pattern. This conservative bias is a definite obstacle to conceptual advance whenever interdisciplinary subjects become fashionable (e.g. development, environment)
This bias is itself protected and reinforced by the manner in which resources are invested in the documentation systems based on the untransformable thesaurus. The consequence is that if a new pattern appears desirable, an alternative thesaurus has to be designed (usually within a different institutional context, as in the case of UNESCO/SPINES). The new thesaurus, however, tends also to be built with the same pattern conserving bias.
Users are thus confronted with a set of unrelated thesauri whose advocates are seldom concerned by the lack of relationship between them. (This gives rise to what might be called the "politics of the thesauri arena"). This bias is therefore a guarantee of discontinuity and of inter-thesauri hubris - both of which undermine the effective mobilization of conceptual resources for development.
Most thesauri are of necessity (given the manual
processing tradition) fairly simple structures. They are merely an evolutionary
step beyond the list. The main concern is to be able to store and retrieve
reasonably wellidentified documents. Unfortunately, this design philosophy
is insensitive to functional relationships between the phenomena thus encoded.
The fact, for example, that mercury may penetrate through food chains to
affect seriously the survival of a bird species is not something that such
a thesaurus is designed to highlight (especially in the case of a specialized
Whilst this might indeed be the case -- if the user knew how (or why) to phrase the correct question -- there is a major difference between a thesaurus designed to highlight (or even "foresee") such relationships, and one which simply "absorbs" the relevant document and "observes" the relationship. Such relationships are vital to understanding the development process.
Although few would argue that the universe is organized into the categories reflected in a given thesaurus, unfortunately many users are poorly served by the mechanistic category assemblages characteristic of existing thesauri. Because thesauri are the prime concern of high-volume documentation systems, it is easy to forget the needs of those who want information directly relevant to development. Such users may be misled by the structure of existing thesauri and their simplified versions. Or, frustrated by the lack of a relevant thesaurus, they may well have to design their own -- however primitive.
The irresponsibility of those skilled in thesaurus design may be seen in their lack of concern for the following user needs:
(c) Curricula design: Consider the relation
between thesaurus design and the elaboration
of a curriculum for a school or university. Despite the
developmental significance of curricula, it is highly probable that
many curricula are "bad" thesauri, and many thesauri
(d) Organization charts: The organization chart of any institution, including governments, is in effect a thesaurus of action responsibilities. Again it is. highly probable that many such charts do not benefit from the skills of thesaurus construction, and that many thesauri are a poor foundation for the structuring of organizational responsibilities, in response to development issues. This problem also extends to planning and policy formulation. Many thesauri may simply reinforce bad policymaking.
Most of the effort in thesaurus construction is directed to clarify problems within some domains. Little effort, by contrast, is directed toward clarifying relationships between the major hierarchies within which this effort is made. The identification of what constitutes the "top of the hierarchy" appears to be an empirical process influenced by:
From a structural viewpoint there is little to distinguish between them, since both ancient and modern groupings take the form of lists or trees whose mathematical description is very similar. The lack of innovation and experiment is to be contrasted with the extraordinary structural variety in the attempts over the last two centuries to move beyond the original lists of chemical elements (13). Some exceptions are noted below.
From a content viewpoint, there have of course been many innovations. But what remains implicit in thesaurus design is the basis on which major groupings, such as "religion" or "art" are included, downgraded or excluded from a thesaurus which is concerned with "economic and social development".
How is the presence of "art" as one major grouping, to be justified in a thesaurus which contains "science"? How are other major groupings to be recognized? What does the historical development of thesaurus construction reveal about blind-spots in the selection of major domains? On what theoretical ground is it possible to stand in order to predict current blind-spots which might be of vital significance to the development process?
To answer such questions, a functionally-oriented awareness is required. What is it in the way a social group functions which determines the way that group cuts up the field of its perception (and then proceeds to reinforce the subdivision by institutionalizing it In various ways)? Avoiding such questions ensures that any thesaurus will not serve well one or more cultures of this highly diversified planet.
A recent UNESCO-endorsed report of the Club of Rome entitled No Limits to Learning (14) stresses the importance for society of 'innovative learning" as contrasted with the traditional forms of "maintenance" or "adaptive" learning - particularly if humanity is to preserve and develop its heritage through the present combination of crises.
In this light, it is fair to state that most thesauri are adapted, after the fact, in response to new issues. Every attempt is made to fit new issues into old frameworks which failed to highlight the last crises. New thesauri when the are designed are generally uncreative compromises between the faults of existing thesauri. They do not breakthrough to a new level of significance. it is therefore not surprising that society is ill-equippped to marshall its resources in response to previously unforeseen crises in the development process.
Thesauri tend to be designed in one of two ways, possibly with some overlap between them:
The problem seems to lie in the failure to separate control of: (a) documents, (b) document location ("shelf") numbers, (c) machine-readable document (bibliographic) descriptions, (d) thesaurus terms, (e) thesaurus term references (e.g. a "line number" as in a word processor file), (f) classification codes, and (g) classification schemes.
An alternative approach, entirely feasible using computers, is to work with a variable classification code. This is possible if the code (f) is not permanently attached to the document description (c). By using the reference number (e), the code (f) attributed to a thesaurus term (d) can be modified at any time. This means that alternative schemes of classification codes (f) can be envisaged so that the thesaurus terms (d) can be regrouped and edited experimentally and -- whenever desired - the document description (c) can be reordered by computer according to the revised scheme. From a computer viewpoint, this is best done using a permanent document number (b). Clearly there is no reason why a preferred "universal" scheme of codes should not be part of the computer programme "library" of ways of ordering the terms, and in fact several such schemes could be available, without preventing experiments on alternative thesaurus designs. Changing an item in the thesaurus does not therefore involve the usual high cost of physically re-indexing a large number of documents, but only the computer processing cost of changing the classification code linked (electronically) to the "address" of the document.
Such an alternative has already been used by the author to regroup, 3,100 international organizations in terms of their 81,000 links to member countries (15). The country code can be varied to order the information in terms of alternative regional groupings of countries. The same procedure in being used to regroup some 10,000 internationally-oriented bodies by subject [applied from 1982 in the Yearbook of International Organizations, Vol 3. (16). Whilst at this stage the depth of indexing may be unsatisfactory for some purposes, the ability to refine and experiment with the classification scheme introduces a much needed dynamic element Into such endeavours. Clearly such an approach has the additional advantage of providing a more realistic educational tool for those learning about some aspects of the problems of thesaurus design.
Thesauri tend to be portrayed as abstract structures from which subjective personal elements have been eliminated. To ensure implementation, the design of thesauri is limited to a small elite (whose professional status benefits in consequence). In fact, however, as any debate among such elites quickly demonstrates, thesauri are highly personal constructs and (as social acts) can engender very emotional responses. Even from a purely logical point of view, as Francisco Varela demonstrates: "In contrast with what is commonly assumed, a description, when carefully inspected, reveals the properties of the observer" (17). A thesaurus is a description which characterizes the designer (or the designing institutions).
If thesauri embody personal, ideological, cultural, and operational biases in this way, this should be more clearly stated (e.g. in the introduction to any UNESCO thesaurus) together with the implications of that statement. But it also follows that much greater emphasis should be placed on arranging for each user to enjoy the creative advantages of personalizing the thesaurus through which s/he wishes to perceive any data set.
This is especially true given the developmental significance of tramforming one's own thesaurus, as one's perspective matures, rather than being imprisoned in some institutionalized construct -- depersonalization has never enhanced the innovative responses needed at this time. The computer possibilities for doing this have been noted above.
The essence of the dramatic situation faced by humanity at this time lies in the conflict between laudable concerns such as economic production and environmental quality, or between communication and the preservation of the uniqueness of different cultures, etc. Whether such polarites are viewed as inherently contradictory or as vitally complementary matters less than the fact that this dynamic feature is totally absent as a structural dimension of thesauri design. The unfortunate exception is that some institutionally inspired thesauri are designed to exclude particular poles when the institution is an advocate of the opposing complementary.
This is especially unfortunate when the institution is obliged to alternate between the polar positions (as in the case of public agencies in a 2-party governmental system).
It is useful to speculate on the advantages of designing into thesauri, not only dyadic, but also triadic, complementarities and those corresponding to greater numbers of set elements (4).
In the ongoing debate as to whether science, and especially social science, can be "value-free". it is easy to assume that thesauri (if nothing else) are free of inbuilt value orientations. However, as has been implied above, this is far from true. In fact each decision in the design of a thesaurus is influenced by a set of values which is never rendered explicit (except in the structure of the thesaurus itself). Expressed somewhat differently:
This situation cannot be avoided, but a more creative response is possible if a user can "personalize" the thesaurus by expanding, collapsing or reallocating categories at will (as suggested under Bias 8), and then to compare them with those of alternative value perspectives. It is even useful to reflect on the possibility of computer generated "distorsion indicators" (when a user suppresses all environmental categories from an industry thesaurus, for example).
Why is it that no thesauri have been produced with values as categories -- or were these the "celestial" frameworks of the medieval period which are made to appear ridiculous by the currently fashionable noun frameworks? As suggested above, modern value thesauri should be possible and would seem to be a vital support for development as an essentially value governed process - and a corrective to some present excesses. On this point, Hall states:
As noted above, most thesauri are a form of nested list structures which is one of the most simple structurees in the range explored by man, both in the arts and in the sciences. One response to this has been to abandon, more or less completely, dependence on a thesaurus in favour of associative structures.
This possibility is encouraged by the power of computer text analysis on data bases, which leave it to the user to order the data into any hierarchies desired - a form of conceptual "anarchy" in reaction to the "imperialism" of list structures. A list does not order the relationships between its elements except in relation to nested sublists or in the case of a list in series form. This does not imply that such relationships are lacking, merely that they cannot be reflected in the list form. Note that a list is. in fact a series of "points", but it is not necessary to conceive of it as such. The points could be represented as areas on a surface. It is only in the matrix that the manner in which the total area is cut up becomes explicit.
An advocated by Ingetraut Dahlberg (18), for example, the matrix is an important step beyond the list. It is important because of the extra dimension of order imposed upon the categories. It is worth noting that a first attempt at this which bears a remarkable structural resemblance to one form of the periodic table of chemical elements (13) seems to have been a renaissance "Torre della Sapienza" (19).
As noted in an earlier paper (20, p.292):
Now to the extent that the matrix is complete in its coverage, there really should not be any "wall". The matrix should in such cases in effect "wrap around" the observer; all is window and nothing is implicit, unexplicated, or excluded. If this is not so, then the wall should be conceived as wrapping around the observer, possibly with other windows corresponding to other partial views of the external totality to which the observer may turn his attention. From this point of view the conventional two-dimensional matrix raises the question of the conceptual significance of crossing the encompassing boundary. it seems irrational and unmeaningful because the wall is unrecognized. There is almost a flavor of danger of "falling over the edge", as sailors feared with the early 'flat earth' models. "
Again referring to the earlier paper (20, pp. 292-3):
Further investigation of the possibility of introducing non-linear curvature into the traditionally planar preoccupations of thesaurus design could well provide clues to a new level of macro-ordering which is hospitable to a variety of linearlplanar thesauri but nevertheless establishes valuable links between them.
It is easy to argue that thesaurus design should be restricted in a concern for categories which can be embodied in an extended list structure. But, given the present vigorous discussion about right and left-hemisphere approaches to knowledge, it is legitimate to ask whether such thesauri are not simply artefacts of the left-hemisphere analytical mind, and as such are functionally incomplete. This however begs the question as to how righthemisphere holistic dimensions are to be introduced, without doing violence both to their very nature and to the preservation of the distinctions vital to the hard-won achievements of left-hemisphere thought.
It has been assumed that this is not possible because of the inherently analytic nature of the thesaurus structure. But as the previous section suggests, macro-ordering possiblitities may be introduced which give a holistic dimension. The question is whether these must necessarily be linear (left-hemisphere analytical) forms of ordering or whether non-linear (right-hemisphere) forms can be introduced. Such a step would do much to retrieve thesauri from their somewhat isolated "archival" function in society, as well as establishing a bridge to the vital philosophical concern for the integrative, paradoxical, non-rational relationship between complementaries and opposites. In this connection, it is much to be regretted that one of the most reknowned Eastern philosophers concerned with this matter, namely Lao Tse, did not also leave a record of his reflections as keeper of the Chinese imperial archives in the sixth century B.C.
In the light of earlier arguments (4), it may be possible to introduce such new dimensions at the macro-level so that they do not affect the analytical detail which is the present focus of thesaurus design. it is possible that thesauri will reach a new threshold of maturity and relevance to human and social development once this is achieved.
Current approaches to thesaurus integration are overly simplistic given the range of structures which man has now explored conceptually.
As argued in an earlier paper (23): there are two extremes in the conventional approach to "integration":
Much richer approaches to thesaurus integration emerge from, and are necessitated by, such varied domains as ecosystem integration, "oscillatory" integration in multi-party political systems, education, strategy, etc. Any organic form of integration which matches the dynamism of real-world phi nomena is perhaps necessarily oscillatory. Surprisingly, perhaps, there in fact much to be learnt from the theory and philosophy of music as guide to further investigation.It is refreshing to note how this possibility emerges from reflections on the non-Western 4,000 year-old chanted hymn of the Rg Veda of the indian tradition. A careful exploration of this wor by a philosopher, Antonio de Nicolas (24), using the non-Boolean logic of quantum mechanics (25) opens up valuable approaches to integration. The unique feature of the approach is that it is grounded in tone and the shifting relationships between tones. it is through the pattern of musical tone that the significance of the Rg Veda is found.
Of the greatest interest is the link made by de Nicolas with P.A. Heelan's concern with "The logic of changing classificatory frameworks" (25) in terms of the conceptual freedom of quantum logic -- which is in complete contrast to the essentially mechanistic structure of conventional thesauri. It is difficult to imagine that significant breakthroughs would not emerge from investigation of such leads in terms of thesaurus design.
This paper is necessarily far from complete but hopefully will prove a stimulus for discussion.
The biases identified are an attempt to give form to the underlying concern that thesaurus design is at present largely counter-productive in terms of the development of society, especially in the manner in which it reinforces, rather than alleviates, the trend towards social fragmentation. As David Bohm indicates:
Only a view of knowledge as an integral part of the total flux of processes may lead to a more harmonious and orderly approach to life as a whole rather than a static and fragmentary view, which does not treat knowledge as a process, and which splits knowledge off from the rest of reality (7, p.63). The latter view brings about a thoroughgoing confusion that tends to permeate every phase of life, and ultimately makes Impossible the solution of individual and social problems (7, p.27). it may well be asked whether those involved in thesaurus design sense any responsibilitly for reflecting the dynamic wholeness of reality, as opposed to providing an efficient "warehouse parts-list'"of its currently recognized components.
Would it not be beneficial to consider the need
for, and the possibility of, totally different kinds of thesaurus design
philosophy which would reconcile (in a creative, dynamic manner)
the relationship between structured and associative approaches to ordering
knowledge? The paper is also a plea
Finally an apology for a basic inconsistency In this paper: Since the list of biases constitutes a primitive thesaurus, it would have been preferable (given the argument of this paper) to structure each bias as a verb in order to shift the whole discussion into a more dynamic mode.
1. Anthony Judge. Knowledge-representation in a computer-supported environment. In: International Classification, 4, 1977, 2, pp.76-81 [text]
2. Anthony Judge. Information mapping for development. In: Transnational Associations , 31, 1979, 5, pp.185-192
3. Anthony Judge. Relationship between elements of knowledge; use of computer systems to facilitate construction, comprehesion and comparison of concept thesauri of different schools of thought. (Working Paper No. 3, COCTA, 1971) [text]
4. Anthony Judge. Representation, comprehension and communication of sets; the role of number. In: International Classification, 5, 1978, 3, pp.126-133; 6, 1979, 1, pp.15-25; 6, 1979, 2, pp. 92-103 [text]
5. Anthony Judge. Societal learning and the erosion of collective memory. (Prepared as the introductory report on utilization of international documentation for Panel iII of the 2nd World symposium on international documentation. Brussels 1980) In. international information for the 805; Proc.of the 2nd World Symposium on International Documentation, Unifo, 1982. [text]
6. H. G. Burger. The transitive taxonomy: classification by the grading of processes. In: CONTA Conference Proc., Bielefeld, 24-27 May 1981.
7. David Bohm. Wholeness and the Implicate Order. Routledge and Kegan Paul 1980.
8. E. T. Hall. Beyond Culture. Doubleday, 1976. also: The Silent Language; The Hidden Dimension
9. Enzyklopadie des Amenemope (1250 v. Chr.). In: Dahlberg. I.: Grundlagen universaler Wissensordnung. Verlag Dokumentation 1974. Taf. Al
10. Ingetraut Dahlberg. Gliederung aus "Das Buch des Wissens" von Avicenna (980-1037). In: Dahlberg, I.: Grundlagen universaler Wissensordnung. Verlag Dokumentation, 1974. Taf. A2
11. Jean Aitchison (Comp.). Unesco Thesaurus. Unesco 1977
12. Inter-Organization Board for Information Systems. Broad Terms for United Nations Programmes and Activities. Geneva: 1979.
13. J. W. von Spronsen. The Periodic System of Chemical Elements; a history of the first hundred years. Elsevier 1969.
14. J. W. Botkin, M. Elmandjra and M. Malitza. No Limits to Learning. Pergamon, 1979. ("A Report to the Club of Rome")
15. Union of International Associations. Directory of National Participation in International Organizations. Brussels: Union of international Associations, 1982. (microfiche only, available since then as Vol 2 of the Yearbook of International Organizations) [info]
16. Union of International Associations. Transnational Action Yellow Pages. Brussels: Union of international Associations 1982 (microfiche only, available since then as Vol 3 of the Yearbook of International Organizations) [info]
17. Francisco Varela. A calculus for self-reference. International J.General Systems 2 (1975) p.5-24
18. Ingetraut Dahlberg. Ontical Structures and Universal Classification. Bangalore, 1978..
19. Torre della Sapienza. Firenze, Bibl. Med. Laurenziana, MS Pluteo 30.24, c 1 r ("Studiolo" scomparto 2). Reproduced in: C B Ceppi and N Confuorto. Il Potere e lo Spazia; la scena del principe, Scala, Electa Editrice / Centro di Edision Alinari. 1980.
20. Anthony Judge. Needs communication; viable needs patterns and their identification. In: Lederer K. (Ed): Human Needs. Königstein: Verlag Anton Hain 1980. p.279-312 [text]
21. L Berti. Il Principe dello Studiolo; Francesco I del Medici e la fine del Rinascimento fiorentino. Firenze: Edam 1967.
22. Lao Tzu. Tao Te Ching.
23. Anthony Judge. Liberation of integration; pattern, oscillation, harmony and embodiment. (Prepared for the 5th Network meeting of the UN University's project on Goals, Processes and Indicators of Development (Montreal, 1980) [text]
24. Antonio de Nicolas. Meditations through the Rg Veda. Shambhala 1978.
25. Patrick A. Heelan. The logic of changing classificatory frameworks. In: Wojciechowski, J.A. (Ed): Conceptual Basis of the Classification of Knowledge. K G Saur 1974. p.260-274.
26. C. A. Hooker. The impact of quantum theory on the conceptual bases for the classification of knowledge. In: Wojcieclhowski, J.A. (ref. 25)
27. Marcel Granet. La pensée chinoise. Albin Michel 1968.
this work is licenced under a creative commons licence.