September 1971

Design of an information system

to facilitate the production of concept thesauri by different schools of thought

- / -


Notes on the design of an information system to facilitate the production of concept thesauri by different schools of thought. Notes presented to a workshop of the Committee of Concetual and Terminological Analysis (COCTA) of the International Poltical Science Association (Bellagio, September 1971) and susbsequently developed into a more extensive proposal Relationship between Elements of Knowledge: use of computer systems to facilitate construction, comprehension and comparison of the concept thesauri of different schools of thought (1971)

Introduction

In a discussion about any form of thesaurus, it would seem to be important to distinguish between the problems of identifying and processing the entities to be included and the problem of classifying these entities or rejecting some on the basis of a particular set of criteria. The first problem is a much simpler one and its solution, if sufficiently general, can be arranged so as to minimize controversy and therefore maximize acceptance of the procedure. The second problem is more complex and the solution may raise real or perceived theoretical issues or even be perceived as a threat by certain schools of thought. This could undermine any attempt to solve the problems of conceptual anarchy.

In these notes some of the difficulties are summarized and an attempt is made to demonstrate how adequate structuring of the required information system may help to eliminate or isolate these problems in such a way as to render the project feasible. These notes do not therefore touch upon any theoretical issues. They are solely concerned with the design of a practical information system to be used as a tool by scholars in order to facilitate solution of such issues.

Difficulties

1. The identification of entities to be included in a thesaurus and the practical problems of incorporating these entities into an information system need to be distinguished from the theoretical problems of classifying and interrelating such entities. The first is a relatively fast and unskilled operation and the second is relatively slow and skilled. This means that the technique of identifying the entity within the system by a numerical tag derived from a classification scheme should be avoided.

2. The experience of the success of the Académie Française with respect to the French language and the International Federation for Documentation with respect to the Universal Decimal Classification of subject areas would seem to indicate that such approaches tend to :

  • be slow in responding to change to the point of acting as a constraint on innovation to those dependent upon them (the UDC Committees in some areas are rumoured to be 10 to 15 years behind in coding the backlog)
  • give rise to a proliferation of competing alternatives for groups of users with slightly different perspectives on subject areas (e.g. UDC, Dewey and UN/OECD Aligned List of Descriptors) who need a tool with slightly different properties.
  • become associated with particular schools of thought, organizations or personalities who resent criticism of their perspective and alienate potential collaborators
  • become viewed as authoritarian and a vehicle of some form of conceptual imperialism. Unfortunately, the organization of relations between entities is equated with the imposition of a new set of relations. The organizers are perceived as acquiring power.
  • These points raise the important question of providing an adequate degree of flexibility and responsiveness to the needs of future users and those in other disciplines and schools of thought (whose theoretical requirements it is clearly difficult to predict). At the same time, there is the problem of ensuring that the system meets the needs of those who initially invest effort in the undertaking.

3. Overdesigning the information handling system to meet immediately perceived needs Bay reduce its usefulness and relevance to others and therefore increase the difficulty of ensuring adequate funds over a long period. (The degree of "hygiene" may be inversely proportional to the utility of relevance of the system to potential users. )

4. The actual procedures for incorporating new entities into any 'approved' list within the system may appear bureaucratic and stultifying unless the system is user-oriented. There is therefore the old problem of minimizing the bureaucratic desire for due process and order and maximizing user participation.

5. Any single classification scheme may come to be treated by some funding bodies as a basis for their system of resource allocation for research. This tendency might be encouraged by the strategies of those seeking research support and could artificially distort research directions.

6. There is an important possibility that the classification scheme may constitute or operate as some form of paradigm (or perhaps "meta-paradigm"). Whilst this is acceptable for the solution of immediate problems of conceptual hygiene it may in the long-term encourage scholasticism. The design of alternatives to a particular classification scheme, or its redefinition, should therefore be encouraged.

7. Some thought needs to be given to the potential users of the thesaurus, and the manner in which use of it might be facilitated. The initiative comes from one school of thought in political science. How can it be made useful to:

  • other schools of thought
  • social science disciplines (in a narrow sense)
  • social science disciplines (in a broad sense)

8. The assumption has been made that the major users would be scholars or students. There may however be possibilities of wider use which raises problems of how users can introduce filters to eliminate excessive detail and other features of the system which are not immediately essential.

Proposal

1. A computer-based concept registration or tagging system should be set up which would allocate sequence numbers to concepts on a continuing basis. The criteria for concept registration should be kept to a minimum to ensure that the system remains "open" to a wide variety of users and contributors.

This approach permits rapid inclusion and organization of the data and rapid production of updated concept lists. These would facilitate the scrutiny of the data in various phases and in terms of the perceptions of different need groups.

2. Evaluation, classification and identification of concept inter-relationships would be made independently by a limited number of contributing groups, possibly associated organizationally with the international academic bodies. These groups would be primarily concerned with allocating codes to be fed back to the computer system so that ordered and refined concept thesauri could be produced to reflect the perceptions and needs of the contributing groups. An important aspect of this coding function by groups would be the rejection of those conceptions registered which are considered to be of little value to the group's perspective.

From the computer data handling point of view, each contributing group would be building, refining and maintaining its own "model". Each such model would be handled as an independent optional qualifier on the sequentially ordered concept list.

Prom the point of view of any such group, the computer system would be viewed as holding the concepts in which it is interested in the order of its own preferred classification scheme.

There would of course be the opportunity at any time to look at the same concept list through the classification scheme of any other contributing school of thought.

3. Once the concept registration system is running smoothly and the professional groups are interacting effectively with the system to feedback their classification of the concepts within their own models, other groups of different levels of "multi- disciplinarity" may constitute themselves to work on the integration into "meta-models" of two or more of the models already produced (e.g. for political science and sociology into a social science model).

4. There is no reason why, for example, a copy (on computer magnetic tape) of the concept list and various models should not be made available to universities for comparative research on the models or as a tool in the educational process. Alternative models could be constructed which could be made generally available.

With respect to research, it is clearly important to enable the user to examine the thesauri at different levels of abstraction by introducing filters. In addition there is the possibility of comparative study of the manner in which different disciplines perceive and interrelate phenomena.

With respect to education, it is possible to develop educational meta-models which would permit selection of concepts by filters corresponding to different educational levels (e.g. an "atom" may be viewed as a billiard ball-type structure in the elementary stages, a miniature solar system, a system of electrically charged potential clouds or, in the final stage, as something which can only be described with mathematical symbols.). At each level a precise definition in the appropriate terms could be provided . In addition the approach could permit individual students to create their own concept thesaurus and to learn from the differences between their own and those of particular disciplines.

5. Each contributing group may wish to distinguish differently between or interrelate the "entities" tagged in the computer sequential register. There is no reason why "concepts", "propositions", "relationships", "problems" etc should not all be treated as entities and appropriately distinguished and interrelated (or ignored and rejected) at the modelling phase. It might, for example, be particularly valuable to include "theories", "frameworks of inquiry", etc. by first giving each a sequential number (as indicated above) and then (in the modelling phase) relating them to the major variables considered significant and necessary to define the frame of discourse associated with that theoretical viewpoint.

This would permit the same system to handle concept thesauri, inventories of propositions, inventories of problems, etc.

6. At a later stage users of one model might find it useful to produce an "authoritative" list of terms to be used for those concepts of interest to them. This could also be incorporated into the computer system

7. It would require further study to determine whether models need to demonstrate concept interrelationships by using the decimal classification scheme or whether a more economical and flexible computer technique could not developed. The latter might avoid the need to have both a sequential tag and a decimal number. Instead a concept would be identified by its sequential number plus a number to uniquely identify the model.

8. Since one of the great disadvantages of computer use is the tendency to generate long, indigestible and impenetrable lists (however ordered), some attention could usefully be paid to the technique of displaying networks of concepts directly onto television-type screens under computer control. The user can then penetrate and interact with the network and can readily have:

  • networks at different levels of the "ladder of abstraction" displayed (ie the computer can be used to explore conceptual schemes nested within one another like Chinese boxes)
  • add or eliminate explanatory texts to clarify concept interrelationships. (This is particularly useful for the successive clarification of conceptual schemes at gradually increasing levels of complexity - in the educational sense)

Data to be included on each entity

A. Identification or Registration Phase

1. Entity sequential number

2. Conventional labels or terms

2.1 Language 1 data
2.1.1 Language code 2.1.2 Label in language (of code given in 2.1.1)
2.2 Language 2 data etc

3. Text of definition

3.1 Language 1 definition 3.2 Language 2 definition

4. Sequence numbers of other entities with same label

B. Classification or Modelling Phase

1. Model number

2. Type of entity code (e g. concept, proposition, problem,etc.

3. Usage

3.1 Date first used 3.2 Date last used

4. Relative importance code

5. Compositional relationship coding (i.e. sequence numbers of entities representing concepts which:(i) are dependent upon this concept,(ii) upon which this concept depends, (iii) are horizontally related to this concept)

6. Comprehension relationship coding (i.e. sequence numbers of entities representing concepts as in B.5 but reflecting reflecting relationships between concepts in terras of ease of comprehension)

7. Systemic relationship coding (i.e. sequence numbers of entities representing the systems corresponding to concepts and reflecting the manner in which such systems are nested or interact with one another)

C. Authoritative Term Phase

1. Text of authoritative term

1.1 Language 1 term 1.2 Language 2 term

creative commons license
this work is licenced under a creative commons licence.