Challenges to Comprehension Implied by the Logo
of Laetus in Praesens
Laetus in Praesens Alternative view of segmented documents via Kairos

1973

Project Features toward a Concept Inventory

***

-- / --


Originally appeared in 1973 as part of Toward a Concept Inventory


Organization of project

The success of a project of this type would be dependent upon the extent to which any central organisation can be avoided in favour of a process of catalysis. There is too much to be done to run the risk of the usual jurisdictional, behavioural, and personality problems associated with a centralised organisation. Such problems rapidly alienate potential support. The problem is therefore to bring into existence a decentralised network of groups working on different aspects of the project, but able to exchange the results of their activities without difficulty.

(It is important to remember that it is probably impossible to "organise" a whole area of knowledge because the latter is well subdivided into territories and "stamping grounds" whose incumbents are reasonably content with the current situation. It may, however, be possible to offer them a reasonably neutral device by which they can each facilitate and order their own particular approach, and, as a by-product, see more clearly its relationship to that in other "neighbouring" territories. Having by this means obtained a decentralised picture of the current situation, it is then possible, in a totally distinct process, to lobby the incumbents into participating to some degree in inter-territory efforts at organising areas of knowledge whilst guaranteeing safeguards for the protection of their "sovereignty".)

A. Launching Phases

A number of phases can be envisaged, some of which could overlap.

1. Investigation: During this Phase the project would be investigated in detail by circulating proposals among appropriate specialists. The main object would be to ensure that the proposal is oriented in the right direction, and that funds for pilot projects are obtained. This phase may be considered to be underway already, through the actions of the COCTA committee.

2. Pilot Projects:. During this phase efforts would be devoted to the following areas:

a) computer program development and file organization.

b) operational and logical problems of classification with models in a few test areas.

c) computer simulation of file movement and modeling activity in a decentralized, minimum-organization environment. It would be particularly valuable to gain some insight into the behavioural problems of rivalry and suspicion between model building groups, and efforts to "take over" the system.

d) computer stimulation of different strategies to keep the system "open" to theoretical formulations from as wide a range of sources as possible whilst trying to minimize the inclusion and retention of formulations of dubious value.

3. Agreement on Standard Formats: On the basis of the previous phase, standard formats for filing new formulations and for holding them on magnetic media would be agreed. Since this is a new type of project, it should not encounter the apparently insurmountable difficulties of those concerned with organizing the computerized exchange of bibliographical information.

4. Production of Standard Software: Once agreement has been reached, a standard software computer program can be made available to all those bodies which wish to initiate some concept modeling activity, to act as a central filing point for their particular constituency. It is possible that initially only one body will be active, possibly as an extension of the pilot project stage.

5. Filing Procedure: Once a standard filing or registration form is developed, there should be no difficulty for any group in receiving and filing identified concepts. This can of course be done by mail.

By filing is meant the purely administrative activity of preparing the forms for the computer. There should be a minimum of judgmental effort at this stage, and none with respect to the theoretical problems of the subsequent modeling activity. The object is to get the incoming information into a form which facilitates the activities of the members of the modeling bodies.

The area of difficulty which does require examination is that of how to decide who should not be permitted to submit concepts for filing into the common data base. This point is considered below.

B. Periodic Operations

1. Lists of Formulations: Periodically the sequence of identified concepts held on magnetic tape should be scanned to produce lists for circulation to the modeling bodies and, if required, their members. Two types of lists can be envisaged.

a) lists of newly-registered concepts which must be scanned by each modeling body to see whether they are in any way relevant to its concerns

b) lists of the complete sequence of concepts for newly formed modeling bodies wishing to re-examine all possible formulations and interrelate them in their own way.

2. Modeling or classification: The lists derived from the previous operation can be examined by the modeling bodies in committee or distributed by post to their members. From these (postal) deliberations should emerge a collective opinion on the place within the classification scheme, of each identified concept reviewed. If necessary, a "provisional" view can be formulated by the use of appropriate coding. In fact this might be a most useful way of submitting a committee's view for wider consideration. Different degrees of "definitiveness" could thus be envisaged.

3. Feedback of Model Information: The details of the place of the concept within a particular model would be indicated on a standard form which could be returned by post for keypunching and incorporation. A modification of this approach would be to permit individual committee members to each return forms for any new entity under consideration. In this way all the alternatives would be incorporated into the model with some "provisional" code so that each member could see the proposals of the others, and their implications. In some cases, this could even be operated as a means of postal voting on the treatment of controversial concepts. The administrative load of the committee is in this way largely computerized.

4. Input of Model Information: The forms from each modeling body would be handled at the central registry point (for that constituency), keypunched and fed onto the magnetic tape file. Keypunching errors would be corrected there as far as possible.

5. Production of Model Amendment Lists: Whenever required, the concepts incorporated into a given model would be selected and sorted into the thesaurus-type structure appropriate to the model and listed for distribution back to the members of the modeling body. This gives members an updated model with all the concepts coded to different levels of "provisionality".

Members can then reconsider their views and proceed from Operation 2 above or, alternatively, for those formulations which have been classified to the agreement of all concerned, the term allocation operation may be initiated.

6. Allocation of Model Terms: Working from the concepts structured into a thesaurus-type order, members can allocate terms to each entity in English and whatever other languages are considered necessary. Again, there is no reason why "provisional" coding should not be used to cover various working cycles of term allocation.

7. Feedback of Term Information: As with model information, the alphanumeric terms allocated to each concept can be indicated on a standard form which could be returned by post for keypunching and incorporation onto magnetic tape.

8. Input of Term Information: The forms from each body allocating terms within a model would be handled at a central registry point, as with the model information itself.

9. Production of Term Lists: Whenever required, the concepts incorporated into the model would be selected and sorted into term lists, either in alphanumeric order or in terms of a thesaurus-type structure. This gives members an updated model expressed in terms coded to different levels of "provisionality". Members can then reconsider their views and proceed from Operation 6 above.

It is clear that the above operations permit a quite extensive degree of "de-committeefication". Members of a modeling body can individually register their views and preferences by post on each concept in the model and in their own time. The resulting lists are circulated and amended to firm up progressively the consensus on each point until final agreement can be reached. Alternatively, if this is a final difference of opinion, then this can be registered as such. Actual discussion need only take place when the accumulation of cases (which cannot be handled by correspondence and a "modeling bulletin" mechanism) merits such contact.

C. Subsequent Phases

A number of phases can be envisaged which follow on from those detailed in "A" above. They do not, however, modify the basic operations noted in "D".

1. File Movement: One of the disadvantages of isolated registration points is that concepts common to two or more constituencies will not necessarily be juxta-positioned. In particular, unless each such point is allocated a block of sequence numbers, there are liable to be overlapping sequence numbering systems which would jeopardize the whole project.

One means of avoiding this, aside from allocating blocks of numbers to each registering point, is to circulate copies of the files between registration points. (Either the tapes themselves could be moved or data links could be used.) This might be considered a standard procedure by which duplicates in all newly-coded concepts could be located and grouped together for consideration by each of the interested modeling bodies, prior to arriving at a "final" decision.

The circulation of such information can be made very rapid. A courier file can be circulated between the registration points for a particular discipline. Information is copied onto and off each such sub-speciality. At one point in its movement, such an intra-discipline file could interact with an inter-discipline file (e.g. for disciplines in the same group) to permit a similar two-way transfer to take place. Similarly a higher level courier file moving between groups of disciplines could permit further exchange.

In this way cross-disciplinary confusion could be avoided. Clearly refinements are possible by using mission-oriented tiles or geographical area files. The system is very flexible. It could even be made to interact with "classified" files by using security, subject matter and evaluation filters to govern the interaction.

The key feature is that it does not require more than a bare minimum of overall organization or funding. It can be extended very loosely in response to the initiative of any highly specialized discipline. Registration points are created wherever (in terms of subject, jurisdiction or geographical level) there is sufficient common interest -- i.e. motivation plus resources. This gets around the current situation in which vain attempts are made to get significant funding for general multi-purpose projects, particularly via any international program.

If cross-jurisdictional problems arise in particular areas, all the administrative work there may be delegated under contract to some party judged to be impartial and uninvolved - a commercial computer service bureau, a university, a government agency, or a user cooperative point might be organized.

The costs involved at each collecting point are:

(a) conversion off information and queries to machine-readable form

(b) processing and output relevant to immediate user contacts

(c) transport costs of the courier file to the next collecting point.

The funds are expended locally in a manner which can be immediately justified and yet this results in making available current information from points very conceptually distant within the system.

D. Accredited Sources

It is clearly an advantage to allocate responsibility for modeling group activity in a particular domain to the appropriate international professional organization.

The difficulty arises in determining which sources of information should be recognized by such modeling groups. In the earlier phases when the group is working through the standard texts, few problems should arise. But once a model is available for inspection, problems will arise in determining whose suggestions for additions or amendments should be accepted. Within a well-defined profession this difficulty may be avoided by recognizing only accredited members of the profession. The right to submit amendments then becomes a right accorded by the profession. This procedure will undoubtedly lead to conflict when areas common to a number of disciplines are considered (e.g. the social sciences, in general), unless each discipline is restricted to its own model.

A distinction should also be made between the right to file an entity and the right to suggest amendments to the model. There is some advantage in giving wider access for filing, but limiting the "retention period" of the entity filed according to the professional standing of the filer.

A later development could be the possibility of retaining entities only if a supporting "vote" was registered by an appropriate number of appropriately accredited persons. The degree of support could be a real time measure of the degree of significance to the discipline of a given theoretical formulation.

Whatever procedure is adopted, it is essential, for the vitality and general relevance of the project, that a wide range of people and organizations should be in a position to add entities to the file - given a few simple safeguards. In this way the interests of every relevant discipline, school of thought, problem area, "approach" or paradigm should be protected. The system would therefore be "open" to social scientists writing in any language or taking any epistemological or ideological position.

Classification and modeling

1. Nature of classification

There is a considerable terminological variation in the scientific literature that characterizes the use of the term "classification". Dalenius and Frank, after making this observation (3) define the term as follows:

"Consider a collective of objects of some kind and a set of mutually disjoint classes. Every object belongs to one, and only one, of these classes. By classification we will denote the act of assigning the objects into these classes. In taxonomy, classification indicates the act of creating classes according to some principle, the term "identification" is used for classification as used in this paper. By the same token, the term "coding" is rather ambiguous. We refrain from its use here, but mention that classification as used in this paper is referred to as coding in the literature dealing with e.g. population censuses."

This definition, whilst appearing to be inclusive, in fact only covers one type of classification, namely hierarchic classification where classes are mutually disjoint. Classification of theoretical formulations is one area in which classes may or may not be ritually disjoint.

J.H. Shera has made an excellent general assessment of the problems of general library classification in an article of his, originally published in 1953 and reprinted in his book, Libraries and the Organization of Knowledge. He concludes that the hierarchical form in itself is not a sufficient basis for the classification of knowledge and that what is required is a directed graph, or non-hierarchic representation.

The relationship between hierarchic and non-hierarchic classification schemes has been the subject of considerable work by Jardine and Sibson (4). They are particularly interested in the stability of the classification produced by a given method as the amount of information (or number of attributes) is increased for the entities being classified. They are looking for measures of distortion introduced by the imposition of a given classification scheme.

This work makes it clear that the process of classification can introduce distortion and that this ran be avoided by using a directed graph representation. In this project the distinction is made between the filing process, the classification process, and the term allocation process.

It is useful to think of the first stage of the classification process as one of "relationship indication", in which the relationships of a given theoretical entity with other entities are inserted. This results in a "directed graph" network of entities which can be searched by computer, particularly to detect clusters with certain properties. This stage corresponds to the determination of similarity or dissimilarity between entities.

In a second stage, the above network can be distorted so that its elements can be fitted into a chosen set of classes with a certain relationship to one another. This is "classification" as opposed to the previous phase which inserts relationships irrespective of any class boundaries. It is convenient to call this activity "modeling". Clearly the modeling activity is a valuable preliminary to "classification". It is particularly valuable in that once completed, different systems of classification can be compared using the entities inter-related by the model, i.e. different degrees of distortion can be imposed upon the network of entities according to the immediate needs of the user. It may be useful to think of modeling in this context as a long-term multi-person, whereas a given classification can be selected from the modeled entities in terms of short-term, need-oriented considerations which permit certain relationships in the network to be considered as "irrelevant" -- permitting the isolation of simple, possibly hierarchic, classification schemes. In some cases, it may however be preferred not to distinguish modeling from classification and to blur the two operations into one another.

2. Filing and Classification

In the case of document indexing application, no distinction is made between filing and classification. Because of this, the administrative problem of filing and the qualified expert problem of classification combine to create severe problems.

The UNISIST Study (5) noted that little progress can yet be reported in the way of indexing-at-source and that a serious limiting factor to any form of cooperative indexing is the range of acceptability of the proposed indexes. Even the all-embracing and widely used U.D.C. has adversaries. The Study also noted that it is unlikely that the concept of a universal scheme will ever make any practical sense in the realm of deep content analysis (p.46) The reasons are the observed differences in the semantic basis of indexing languages which are the consequence of well-founded differences in outlook and interests on the part of a highly-diversified community of users.

All that can be looked for, according to the UNISIST Study, "is the existence of semantic relations between the different lexical sets (be they called classifications, lists of disciplines, thesauri, automatic dictionaries for converting natural language into information language, etc). The study of these relations is the subject of ongoing research on the "compatibility" of indexing vocabularies .... the subject is now receiving much attention as an essential part of projects aimed at establishing world-wide interconnections between information systems." (p.46)

It would appear from this, that the distinction between the impracticalities of classification and the practicalities of "relationship identification" (i.e. modeling) is becoming established. But the filing or administrative aspect of "entity capture" is now blurred into the modeling phase. There is as yet no suggestion that work on "computability" would be considerably facilitated if similar filing techniques were used prior to the activity at the modeling level at which the "well-founded" theoretical differences arise. Standardization is possible, but at a lower level consistent with user requirements. Until this is realized the relationship between lexical sets cannot be handled systematically by computer methods.

3. Advantages of Numerical Filing System

The three major advantages of a sequential, non-significant numbering system for entities are:

Entities, relationships and models

1. Types of Entity Included

There is a very varied terminology currently in use to characterize theoretical products. Gunnar Sjöblom notes the use of conceptual (analytical, theoretical) frameworks, analytical schemes, paradigms, orientations, frameworks for inquiry, theory-sketches, pre-theories, etc.(7) The same is true for the components of the scientific process: problems, observations, empirical generalizations, models, derived propositions, hypotheses, theories, etc. It is unlikely that any immediate agreement could be achieved on a standard terminology, even if this was in fact beneficial.

Each of the conceptual constructs represented by the above terms may be treated as an "entity" which could be incorporated into a computer fable. Once incorporated, efforts could be made to attach an appropriate distinguishing code to them within the framework of a given model. It is highly probable, for example, that under different models the same entity may be coded differently, or alternatively that distinctions important within one model will be insignificant in another (e.g. theory and model; hypothesis and proposition).

As a summary, the above entities are numbered below to facilitate discussion on possible groups of entities:

A. Concepts

B. Meta-concepts

C. General

D. Assumptions

E. Methods

F. Problems

G. Hierarchies

H. Operationalization

I. Data

J. Social

There is some advantage in a two-level coding here, because it might be possible to arrive more easily at agreement on the more general level coding, even if there are differences between models on the coding within that 1evel. There is of course the possibility that within a particular model the grouping would be done differently, in which case the coding scheme would be peculiar to that model.

2. Types of Relationship Included

It is not the intention of this project to set up a single rigid classification of permissible relationships between entities. Just as no effort was made to limit the types of entities that could be handled (see above), it should not be necessary to make the futile attempt to resolve the intellectual problem of how many types of relationship are significant. That the attempt would be futile on the part of any one group is shown by Eric de Grolier's excellent chapters on this expression of relationships in generalized and specialized coding systems, in natural languages, and in experimental languages (8). He concludes, in his UNESCO/FID supported study, that it proved impossible to produce a systematization that was "sufficiently satisfactory to warrant even preliminary publication".

This conclusion should not however lead to a decision to adopt some hypothetical "best existing scheme" or to the formulation of a news scheme. It should be recognized that the project should be capable of handling as many different schemes as possible. In fact the evolution of knowledge is partly represented by attempts to produce new schemes of relationship and categorization.

Without recommending any particular scheme, it is useful to attempt to list out some of the relationships to give an idea of the variety that his been envisaged. De Grolier suggested a clarification of the sign ":" in the UDC (rejected by the FID Central Committee on Classification for the UDC) which covered the following relationships:

1.1 Appurtenance (belonging)

1.2 Process

1.3 Dependence

1.4 Orientation

1.5 Comparison

51 Resemblance, likeness, similarity

Other typologies of relationships have been formulated by Gardin, Farradane, Perry and Kent, Juilland. Each uses very different and, at least superficially, unrelated categories. The different models envisaged in the next section encompass compositional, behavioural, didactic, historical, cybernetic and problem-oriented relationships.

3. Types of Model

It is important to keep in mind the many possible uses of the proposed computer-based filing system. Concentration on one set of uses may not necessarily keep the system alive either in terms of funding or value to current research activity. Multiple demands on it would ensure multiplicity of fund sources and many bodies willing to feed in entities and assist in different aspects of the coding.

The following types of model are an illustration of the possible lines of development. The list does not pretend to be exclusive so that other kinds of model could be included. An attempt has been made to group, the models into types which in some cases might usefully be treated on the same occasion by the responsible modeling group.

It is important to note that the models are not only simple hierarchies but can also be networks of relationships in cases where categories overlap or one entity can be a component of several other entities.

Group 1: Current Structures

This is a poor title but refers to all the current and new structures and relationships as made up of:

1.1 Compositional Models: These models would be primarily concerned with the manner in which entities are nested within one another to form hierarchies. Six types of relationship are possible here in three sets of two.

a) Meta-level: reference numbers of all entities of which this entity is a component.

(This relationship could be split into two sub-types as the computer-level data formats for other types of model require such a split.)

Examples are: theories in which this concept is used, general class of concepts to which this concept belongs, general problems of which this problem is a part, organizations of which this organizational unit is a member.

b) Sub-level: reference numbers of all entities which are components of this entity. (This relationship could be split into two sub-types for the same reasons as above.)

Examples are: concepts used in this theory, concepts which belong to this class of concepts, properties or attributes of this concept, sub-problems of this problem, organizational units which are members of this organization, etc.

c) Associated reference numbers of all relevant entries which have a horizontal relationship to this entity.

- See - also entities, namely those which should also be borne in mind when considering this entity.

Examples are: cases of insufficient terminological precision.

- Use - instead entities, namely those which should be substituted for this entity.
Examples are: cases where an entity is outmoded for that model.

An interesting map of relationships between conceptual entities is given in Figure 1. This shows the interlocking and meeting of concepts associated with measurement of simple physical phenomena.

1.2 Behavioural Models: At the same time that the modeling activity is undertaken on the compositional relationship in 1a, it should be useful to consider some non-compositional relationships to other entities. In other words, the effects of the presence of one conceptual entity on another in the "ecosystem of ideas"(9). By this is meant concepts which are indirectly undermined or strengthened by the validity of this concept, organizations whose monopoly is weakened by the presence of this organization.

Group 2: Contextual Structures

Again this is a poor title but refers to the historical and comprehensional relationships which constitute a context for the Group 1 current situation, and would be used in learning about the Group 1 situation.

2.1 Educational Models: These models would be produced by those modeling groups primarily concerned with education and raking more sophisticated concepts comprehensible.

2.2 Historical Models: These models would be produced by those modeling groups interested either in historical research on the history of ideas or in providing an historical framework to assist education. It is probable that the educational and historical models should be considered together, which is why they have been grouped.

Group 3: Real World Systemic Relationships

The previous groups of models deal with the relationship between conceptual entities in anthropocentric terms or within the logic of particular disciplines. It is also useful to consider the systemic effects of real world entities on one another. This produces another pattern of relationships between the entities registered.

The best example of this distinction is the inter-disciplinary nature of environmental problems, when for example, it is the real world interaction of chemicals in food chains which cause egg shells to become thin -- leading to high chick mortality rate of some bird species. For a social example, the relationship shown between the entities, represented by boxes in figure 2, give a schematic representation of the factors binding a Canadian Indian to a pattern of problems.

Group 4: Term-oriented Models

In some cases where classification is rudimentary or non-existent, the emphasis is placed immediately on the terms. This is the case when:

There is no reason why each such set of terms should not be treated as a model as in the other groups. Where appropriate, the classification code position would be omitted and only the term positions used.

Group 5: Administrative Models

The assumption made in discussing the earlier groups of models was that the model was in some way a definitive structure on which new work would build. It is however possible to use the model building code to facilitate the administrative work on the definitive model.

Group 6: Mission-oriented Models

An assumption made in earlier groups is that the modeling bodies would all be discipline-oriented. There is however no reason why mission-oriented models should not be used where appropriate (e.g. in connection with development, environmental problems, etc.)

Group 7: Interdisciplinary Models

Clearly it is most important to avoid a "babble of models". A second level operation of model reconciliation to form a set of interdisciplinary or inter-model models could therefore be instituted when required.

These could either (i) be constructed (automatically by computer) from all the entities common to the models from which it is desired to produce an inter-disciplinary model, or (ii) be constructed by selection based on judgement of the best from each.

Group 8: Future-oriented Models

A final assumption made in dealing with the earlier groups was that only the current or historical situations would be modeled. There is however no reason why speculative models should not be produced showing the relationships between entities at different points in the future. The modeling activity might then in some ways represent the Delphi method of forecasting.

Group 9: Personal Models

Perhaps a long term ideal is for a person to be able to "look at" (or interfere) with the basic list of entities in terms of his own model which is his personal "thought file". Each new idea he gets could be usefully reflected in the structure of this file.

Group 10: Sub-models

In some cases a particular sub-branch of knowledge may be fragmented by reinterpretation, reconceptualization and redefinition of the same entities. It is then appropriate to use a "sub-modeling" strategy. In other words, instead of requiring "dissident" groups to conform or to divert their energies into a parallel model with differences in a minor area, a sub-model could be used to redefine that area in the dissident group's terms. The sub-model would therefore offer an alternative interpretation.

Group 11: Languages as sub-models

It may be convenient, for some purposes, to consider the relationships between theoretical formulations used in a particular language as a sub-model. The differences between the concepts encountered in Indo-European languages are relatively minor, so that term equivalents pose no great problems, but should it be necessary to enrich the system by incorporating theoretical formulations from other language groups problems could arise.

Data to be included on each entity

A. Concept Filing Phase (Identification or Registration)

1. Entity Sequence Number (10): Each new conceptual entity, of whatever type (see earlier section), receives a unique number which is the next available in a sequential list. The number therefore contains no significant digits or codes and has no meaning for classification purposes. (It may be an advantage to use the check digit technique.)

For practical purposes, it may be convenient to pre-allocate blocks of numbers to different filing centres whenever required. This avoids problems of duplication and speeds up administration. Where duplication does occur, this is eliminated at the modeling stage.

One advantage of this sequence number as a concept identifier is that it is not necessary to file a definition or conventional term at the same time. This is convenient if a new theoretical formulation has been tentatively conceived with known relationships to other concepts but with no clear definition or label yet. It avoids the need to coin doubtful neologisms in order to register the concept. in some cases it may even be an advantage to leave the term defined by its context of relationships, and not to bother attempting to find a suitable term. In which case the sequence number would be used as the only identifier until a suitable terminology for concepts in that domain can be elaborated more systematically.

2. Model Description

2.1 Model Number: The act of filing an entity is distinct from the later modeling activity. The "model number" in this case is "0". This artifice permits the definitions and the conventional terms or labels in different languages to be handled within the computer record framework together with the modeling and term allocation activity.

2.2 Sub-model Number (see model Group 10): Again, since entity filing is distinct from the later modeling activity, this zone is "free". It is therefore used to distinguish between:

2.3 Language (see model Group 11): Since the definitions or the label may be given in several languages a language code is used, (e.g. English "1", French "2", etc.).

2.4 Alternatives: There are bound to be cases, for a given language, in which alternatively worded definitions (with the same meaning) are put forward. Similarly, where several conventional terms or labels referring to the same entity exist, these may also have to be filed. A simple sequential code ("1", "2", etc.) is therefore used to distinguish between successive alternatives.

3. Cross-reference: Cross-references are used during the modeling phase so that this zone is "free". It is, however, used in this phase to identify the sequence number of :

4. Source Code: There are several possible ways of handling information about the source of information on the entity.

4.1 Ignore: In a simplified system it is not necessary to include it since such information can be found in a backup card file.

4.2 Abbreviate: Some general code, indicating the country, the publication or the filing group can be used.

4.3 Name: The name of the person, or filing organization, may be given in some abridged form (e.g. "DEUTKW" for Karl W. Deutsch).

4.4 Name and Support: In a more elaborate system, in which members of a discipline are expected to indicate any strong "support" or "opposition" to any new theoretical formulation, a "voting" technique may be envisaged (see page ). This option could be confined to the "elders" of the profession -- or left open to all members of a profession. As "professional" activity, this might be restricted to the modeling phase.

A given member of the profession, if sufficiently aroused, could then file his support or opposition in the form "DEUTKW +" or "DEUTKW -".

4.5 Name and Reference: It might be thought more valuable to give not only the name but the reference to the document in which the theoretical formulation is discussed and justified. On the question of abbreviations to document reference, one is immediately in the jungle of dispute amongst librarians, documentalists, etc. Several possibilities exist.

4.5.1 Use an extended bibliographical "standard" reference. This uses a lot of space and is mainly pleasing to librarians.

4.5.2 Use an abbreviated reference as in "Science Citation Index" (e.g. the first four letters of the first two significant words of the title, plus the year date, issue or volume number within which pagination is consecutive, and the first significant page number -- "DEUT KW -- NERV GOVE -- 1963 -- O -- 192").

4.5.3 Use a sequence number code. To avoid getting bogged down in documentation problems, a simple sequence number could be used for each publication:

either: i) referred to by the system (e.g. a complete sequence across all authors)

or: ii) referred to by the system for a given author (e.g. starting from zero for each new author).

A parallel "documentation" system would be required to decode the codes used in the approach but it might prove much tidier and practical in the long run (e.g. "DEUTKW 509") (11). The precise page numbers might be an additional requirement (e.g. "DEUTKW 509-192"). Again, as a "professional" activity, this might form part of the modeling phase.

5. Model Descriptor: This is not used during this phase.

6. Relationship Descriptor: This is not used during this phase.

7. Date Codes

7.1 Date first used: The date on which a theoretical formulation was first used is inserted here. If this is not supplied, the computer can automatically insert the date on which the entity was filed.

7.2 Date last used: This date is supplied as a result of general consensus by all modeling groups and is therefore not dealt with during this phase.

7.3 Retention period: It may be an advantage in this phase to tag some entities of unknown value so that they will automatically be dropped from the system after a certain period unless some contrary instruction is received in the meantime. Different retention periods can be used according to the status of the source.

8. Status: For administrative purposes it is convenient to have a zone in which codes may be used to indicate that the entity is "under consideration", "of doubtful value", "no longer used", etc.

9. Text: The words or text used for:

would be inserted into this zone. This zone could also be used for any special comments which might be usefully added.

B. Concept Coding Phase (Modeling or Classification)

Many of the zones discussed above are used in this phase but for a different purpose or in order to establish computer records distinct from those created during the earlier phase or by other modeling groups.

1. Entity Sequence Number: This is repeated for each new relationship established within a model and is of course the same as that used in filing the identity in A.1

2. Model Descriptor

2.1 Model number: As discussed elsewhere (see page ), each modeling group receives a unique number (e.g. "362") which identifies the system of relationships which are elaborated and filed, while at the same time distinguishing it (at computer level) from any other systems. There is some argument for attaching special significance to particular digits of the model number with a view to clarifying a hierarchy of models or, at least, showing a relationship between models. In other words, at this level a U.D.C.-type approach might be used so that "political science" models are all identified by "32N" and "anthropology" models by "39N". This is probably a temptation to be resisted however, since it has some theoretical implications which are better contained within models. In which case a simple sequential list should be established from which the next available model number could be taken.

2.2 Sub-model number: This is a zone to be used by a modeling group whenever a level of dissent is encountered so that alternative sub-models within the general model can be satisfactorily handled and identified. Normally, in the absence of sub-models, this would be "0".

2.3 Language: Since the relationship between concepts is supposedly language independent, this zone should normally be "0". There are, however, cases where relationships are identifiable in one language but absent, ridiculous, or ambiguous in a second. In such cases it may be convenient to use this zone for a form of language-dependent sub-model.

2.4 Alternatives: This zone is not used in this phase and must be "0" (to permit identification of the term records in the next phase where it is non-zero).

3. Cross-references: This zone supplies the main means by which the relationship of this entity to other entities is indicated for the particular model indicated in 2.1. The sequence number of the other entity is indicated here. In effect, every such "relationship" gives rise to a new computer record (see figure 3). The type of relationship is either implicit, because of the model used, or is described in 6 and 7.

4. Source Code: Depending on the method chosen (see A.4.1, A.4.2, A.4.3, A.4.4, and A.4.5), the source coding would probably either be allocated during the concept filing phase with nothing in this phase, or in this phase with nothing in the previous phase. In the most sophisticated system, it might however be desirable to give:

Source coding during the modeling phase might be particularly helpful in the administrative work of elaborating a model, since it permits members of a modeling group, working independently and in isolation, to "vote" on the insertion or deletion of particular relationships (see A.4.5). Such a postal vote system would be particularly helpful in clarifying with precision just what was under discussion at any point in time.

5. Model Descriptor: This zone is used to indicate which model is to be considered at the entity cross-references in 3. In a simplified system this zone would not be required because the assumption would be made that each model was totally isolated from other models.

In a more sophisticated system however, there is need for a means of expressing relationships between parts of models. For example, it may be that in a certain domain two models are identical or that one forms a subset of the other. In such a case there is little need to duplicate all the relationships in the second model, provided cross-reference between the models is possible.

5.1 Model number: As for 2.1, but the model is only to be entered at the entity in which the cross-reference in 3 refers.

5.2 Sub-model number: As for 2.2, but again is only to be entered at the entity to which the cross-reference in 3 refers.

5.3 Language: As for 2.3, but again is only to be entered at the entity to which the cross-reference in 3 refers.

5.4 Alternatives: Not used. (This zone may even be omitted entirely.)

6. Relationship Descriptor: This zone is used to describe the relationship constituted by the link between this entity and that cross-referenced in 3. Two basic types of relationship descriptors may be distinguished.

6.1 Relationship descriptor A: This is used to give an indication of the relative levels of the two entities related (e.g. class and member), directions for flow (e.g. from or to), etc. These are used, for example, to indicate any hierarchical relationships. These codes and the cross-reference in 3 are all that is required for a graph-theoretical analysis of the network of concepts. It is here that any "see other" code would be inserted.

It is also important to indicate the type of relationship between two entities (see page"..) :

This is an indication of what is flowing or the nature of the relationship. It does not seem feasible to predetermine the possible types of relationship which might be required. The technique which can be adopted is therefore to use a simple numeric code - the next available in a sequential list - for each new type of relationship with which a modeling group wishes to work.

The arrangement in this zone could be left up to the modeling group. It is desirable that standard codes should be developed to facilitate graph-theoretical analyses and that a standard code system should be used to denote types of relationships (e.g. "321" where the numbers have no special significance).

6.2 Relationship descriptor B: This is used for evaluation descriptors. In other words the codes used here supply some form of ranking to the relationship described in 6.1 (e.g. some measure of relative importance (within the model), some measure of degree of relativity, etc.)

It is in this zone that the degree of consensus on the characterization of the concept by the discipline could be coded. The zone may even be used to carry quantitative information on the size of flow represented by the relationship and also its periodicity, if relevant.

Again, the arrangement of this zone could be left up to the modeling group. It is however desirable that a standard form should be developed -- even if exceptions to it are frequent.

7. Date codes

7.1 Date first used: This may be used to indicate the date each relationship between entities was first noted, or alternatively the computer can automatically insert the date on which the relationship vitas first filed.

7.2 Date last used: This date may be used when the relationship is finally rejected as invalid or unacceptable.

7.3 Retention period: This zone may be used by members of a modeling group to communicate with one another. A member may submit "trial balloon" relationships, with' a very short (one-cycle) retention period so that others can "see how it looks". Once agreed, the retention period can be set so that relationships periodically come up for review.

8. Status code: For modeling group administration purposes it is convenient to have a zone which may be used to indicate that the relationship is "under consideration", "a tentative proposal", "a firm proposal", "agreed by the group", "required priority attention", etc.

9. Text: Normally a relationship record should require no text. There is however no reason clay this zone should not be Cased for any text comments on relationships which may seem significant to the modeling group.

C. Term Allocation Phase

1. Entity sequence number: Required as before.

2. Model descriptor A:

2.1 Model number: Required as before. A term can only be authoritatively allocated within the modeling group. It is utopian to expect that consensus can be consistently achieved between modeling groups on a unique authoritative term for the entity to which they all refer in their different ways.

2.2 Sub-model number: This should normally be zero, since it will probably be easier to achieve consensus on a term between model and sub-model than between model and model.

2.3 Language: Required as before for each language version of the authoritative term.

2.4 Alternatives: This must be "1" or greater to distinguish the term records from the relationship records. If alternative authoritative terms are required in a given language the zone would be used to distinguish between them.

3. Cross-reference: Normally this would be "0". It may however be necessary to indicate other entities using the same term (but obviously with a different meaning).

4. Source code: There may be some cases where it is important to indicate the document in which the justification for the unique authoritative term is urged.

5. Model descriptor B: May be required if the cross-reference to a use of the same term in a different model is needed.

6. Relationship descriptor: Not required.

7. Date codes

7.1 Date first used: This may be used to indicate the date the term was first used, or alternatively the computer can automatically insert the date on which the term was first filed.

7.2 Date last used: Terms fall from favour. The last date of use can be indicated here.

7.3 Retention period: May be used as in B.7.3.

8. Status code: May be used as in B.8

9. Text: The words used in the authoritative term are inserted into this zone. Alternatively, the equivalent decimal coding could be inserted, if desired.

Limitation of scope and sources of concepts

1. Scope: The design of the system is sufficiently general that it could be used to order theoretical formulations in any area of knowledge. Such broad coverage would clearly be impracticable, and probably even undesirable, in the foreseeable future.

It is useful to re-emphasize that the proposal is not concerned with the areas covered by social science documentation as there are many such documentation projects. The UNISIST report mentions the parallel programs proposed by such bodies as the International Council of Social Sciences and the International Committee for Social Sciences Documentation. There are numerous equivalent projects at the national level. The object here is to concentrate on theoretical formulations which may or may not be mentioned in a given collection of documents.

The priorities proposed would be based on three dimensions:

  • commencing with the more abstract formulations and then moving to the more specific or concrete
  • commencing with formulations of interest to several social science disciplines and then moving to those common to several schools of thought, and finally to those current within one school of thought only. (The suggestion is that an effort should be made to elaborate the significance of "inter-", "multi-" or "trans-disciplinary" concepts as a priority area of study with respect to knowledge analogous to the focus on international relations as opposed to national level activities. The degree of interdisciplinarity of a concept is a valuable means of determining priorities (12).
  • commencing with theoretical formulations before going on eventually to methods and supporting data

This does not of course preclude any modeling group from concentrating solely on the formulations of its own school of thought. The main concern however should be to ensure that the system reflects the general framework of theoretical formulations. Highly specialized formulations should not clutter up the modeling activity. Little effort should be made to include minutiae about particular social entities which have not been reflected in more general formulations -- unless such minutiae represent unique evidence of the need for new formulations. The system should be compact and easy to use rather than large and unwieldy as are most documentation systems.

2. Sources: Guidance in limiting scope can be obtained by concentrating in the light of the above priorities on concepts mentioned in such publications as

2.1 David L. Sills (Ed.), International Encyclopedia of the Social Sciences, Macmillan, 1968.

2.2 Julius Gould and W.L. Kolb, Ed.), A Dictionary of the Social Sciences (compiled under the auspices of Unesco) Free Press, 1964.

2.3 UNESCO. Main Trends of Research in the Social and Human Sciences. Paris, Unesco, (Part one: social sciences, 1970, 819 p.; Part two: human sciences, 1972). Also in French edition.

2.4 International Committee for Social Sciences Documentation. International bibliography of the social sciences. Tavistock, 4 annual volumes (sociology, political science, economics, social and cultural anthropology).

2.5 Key textbooks in each discipline.

2.6 Specialized multi-lingual dictionaries and glossaries, such as:

    • Gunter Haenich. Dictionary of International relations and politics; systematic and alphabetical in four languages [German/English/French/Spanish). Elsevier, 1965. This dictionary has 5778 terms with equivalents in the four languages.
    • I. Paenson. English/French/Spanish/Russian Systematic Glossary of Select Economic and Social Terms. Oxford, Pergammon, 1964. Attempts to present a system of inter-related concepts which reflect a vertical hierarchy and are presented within a continuous text in a systematic exposition of a given subject.

2.7 Institute for Scientific Information. Social Sciences Citation Index. Philadelphia, (included in the SSCI are three separate but related indexes of different periodicity covering the literature of the specified calendar year. Price in 1973: $1,250(sic) ).

2.8 International social science organisations. A preliminary count indicates that possibly some 30 such bodies could contribute in some way to the project.

Concept notation in documents

It has been stressed that this project does not require a complex notation system since each concept is represented by a single sequence number, plus an indication of the model number in question, if required. Nevertheless, since one object of this approach is to permit scholars to refer with precision to a particular concept in their papers, a standard method of indicating such a concept in print is required.

A similar problem arises in the natural sciences in distinguishing between different isotopes of the same atom (i.e. cases where slightly different versions of the same atom exist due to differences in atomic weight), where the same symbol does not distinguish between isotopes. The solution adopted is to indicate the atomic weight as a superscript to the standard symbol.

In the case of concepts, represented in print by the same word, one solution would be to use the sequence number of the concept as superscript to the word:

e.g. democracy+251 democracy+942

To avoid confusion with bibliographical references, the number could perhaps be preceded by an asterisk.

There is a strong temptation to adopt a technique for uniquely identifying concepts similar to that of the International Standard Book Numbering (ISBN) system now used ton the reverse of all recent book title pages) to give a unique code to each book. This number consists of 10 digits made up of the following parts:

  • group identifiers (i.e., national, geographical, language or other convenient group). An "agency', coordinates the allocation of numbers within each group e.g., one for Anglo-American publications ("a"), one for UN system publications, etc. The group identifier is allocated by an international standard book numbering agency (in formation). (This could be considered as a concept filing centre identifier allocated by some loose coordinating body.)
  • book publisher identifiers. The publisher identifier is allocated internally within the group by the group agency. (This could be considered as an accredited concept filing source identifier allocated with respect to the filing centre for which it locates Few conceptual entities)
  • book title identifiers. A block of sequence numbers is reserved for each publisher to permit him to select the next available for the next hook. This could be considered as a block of sequence numbers for concepts, so that each accredited source can select tile next number as each new concept is identified.
  • check digit. This ensures that the code has been correctly transcribed and input to the computer. A computer pre-generated list of "available" sequence numbers incorporates this digit (which is calculated on a modular 11 with weights 10-2, using X in lieu of 10 where 10 would occur as a check digit).
The total length is 10 digits, but the three identifiers only total 9 digits. In order to avoid wastage of numbers or lack of sufficient numbers, publishers with a large book output (of which there are few) have a two or three digit identifier so that the title identifiers can use six or five digits. A small publisher (of which there are many) has a five or six digit identifier so that the title identifier can case two or three digits. The publisher identifier is therefore selected on the basis of his output using from two try six digits as required. Hyphen separators are used.

The temptation to use this system should however be resisted. While the significance attached to the digits is only "administrative" and has no "theoretical" implications, problems of overflowing the allocated blocks are bound to occur. The system will "bulge" in unpredictable areas as the U.D.C. has done. It is also questionable whether so much significance should be placed on the source which, ones the concept has been incorporated, will quickly become irrelevant within the network of other related concert. from other sources.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

For further updates on this site, subscribe here