Project Features toward a Concept Inventory
- / -
Originally appeared in 1973 as part of Toward
a Concept Inventory
Organization of project
The success of a project of this type would be dependent upon the extent to
which any central organisation can be avoided in favour of a process
of catalysis. There is too much to be done to run the risk of the usual jurisdictional,
behavioural, and personality problems associated with a centralised organisation.
Such problems rapidly alienate potential support. The problem is therefore to
bring into existence a decentralised network of groups working on different
aspects of the project, but able to exchange the results of their activities
(It is important to remember that it is probably impossible to "organise"
a whole area of knowledge because the latter is well subdivided into territories
and "stamping grounds" whose incumbents are reasonably content with
the current situation. It may, however, be possible to offer them a reasonably
neutral device by which they can each facilitate and order their own particular
approach, and, as a by-product, see more clearly its relationship to that in
other "neighbouring" territories. Having by this means obtained a
decentralised picture of the current situation, it is then possible, in a totally
distinct process, to lobby the incumbents into participating to some degree
in inter-territory efforts at organising areas of knowledge whilst guaranteeing
safeguards for the protection of their "sovereignty".)
A. Launching Phases
A number of phases can be envisaged, some of which could overlap.
1. Investigation: During this Phase the project would be investigated
in detail by circulating proposals among appropriate specialists. The main object
would be to ensure that the proposal is oriented in the right direction, and
that funds for pilot projects are obtained. This phase may be considered to
be underway already, through the actions of the COCTA committee.
2. Pilot Projects:
. During this phase efforts would be devoted
to the following areas:
3. Agreement on Standard Formats:
a) computer program development and file organization.
b) operational and logical problems of classification with models in a few
c) computer simulation of file movement and modeling activity in a decentralized,
minimum-organization environment. It would be particularly valuable to gain
some insight into the behavioural problems of rivalry and suspicion between
model building groups, and efforts to "take over" the system.
d) computer stimulation of different strategies to keep the system "open"
to theoretical formulations from as wide a range of sources as possible whilst
trying to minimize the inclusion and retention of formulations of dubious
On the basis of the previous
phase, standard formats for filing new formulations and for holding them on
magnetic media would be agreed. Since this is a new type of project, it should
not encounter the apparently insurmountable difficulties of those concerned
with organizing the computerized exchange of bibliographical information.
4. Production of Standard Software
: Once agreement has been reached,
a standard software computer program can be made available to all those bodies
which wish to initiate some concept modeling activity, to act as a central filing
point for their particular constituency. It is possible that initially only
one body will be active, possibly as an extension of the pilot project stage.
5. Filing Procedure: Once a standard filing or registration form
is developed, there should be no difficulty for any group in receiving and filing
identified concepts. This can of course be done by mail.
By filing is meant the purely administrative activity of preparing the forms
for the computer. There should be a minimum of judgmental effort at this stage,
and none with respect to the theoretical problems of the subsequent modeling
activity. The object is to get the incoming information into a form which facilitates
the activities of the members of the modeling bodies.
The area of difficulty which does require examination is that of how to decide
who should not be permitted to submit concepts for filing into the common data
base. This point is considered below.
B. Periodic Operations
1. Lists of Formulations: Periodically the sequence of identified
concepts held on magnetic tape should be scanned to produce lists for circulation
to the modeling bodies and, if required, their members. Two types of lists can
a) lists of newly-registered concepts which must be scanned by each modeling
body to see whether they are in any way relevant to its concerns
b) lists of the complete sequence of concepts for newly formed modeling bodies
wishing to re-examine all possible formulations and interrelate them in their
2. Modeling or classification: The lists derived from the previous operation
can be examined by the modeling bodies in committee or distributed by post to
their members. From these (postal) deliberations should emerge a collective
opinion on the place within the classification scheme, of each identified concept
reviewed. If necessary, a "provisional" view can be formulated by
the use of appropriate coding. In fact this might be a most useful way of submitting
a committee's view for wider consideration. Different degrees of "definitiveness"
could thus be envisaged.
3. Feedback of Model Information: The details of the place of
the concept within a particular model would be indicated on a standard form
which could be returned by post for keypunching and incorporation. A modification
of this approach would be to permit individual committee members to each return
forms for any new entity under consideration. In this way all the alternatives
would be incorporated into the model with some "provisional" code
so that each member could see the proposals of the others, and their implications.
In some cases, this could even be operated as a means of postal voting on the
treatment of controversial concepts. The administrative load of the committee
is in this way largely computerized.
4. Input of Model Information: The forms from each modeling body
would be handled at the central registry point (for that constituency), keypunched
and fed onto the magnetic tape file. Keypunching errors would be corrected there
as far as possible.
5. Production of Model Amendment Lists: Whenever required, the
concepts incorporated into a given model would be selected and sorted into the
thesaurus-type structure appropriate to the model and listed for distribution
back to the members of the modeling body. This gives members an updated model
with all the concepts coded to different levels of "provisionality".
Members can then reconsider their views and proceed from Operation 2 above or,
alternatively, for those formulations which have been classified to the agreement
of all concerned, the term allocation operation may be initiated.
6. Allocation of Model Terms: Working from the concepts structured
into a thesaurus-type order, members can allocate terms to each entity in English
and whatever other languages are considered necessary. Again, there is no reason
why "provisional" coding should not be used to cover various working
cycles of term allocation.
7. Feedback of Term Information: As with model information, the
alphanumeric terms allocated to each concept can be indicated on a standard
form which could be returned by post for keypunching and incorporation onto
8. Input of Term Information: The forms from each body allocating
terms within a model would be handled at a central registry point, as with the
model information itself.
9. Production of Term Lists: Whenever required, the concepts incorporated
into the model would be selected and sorted into term lists, either in alphanumeric
order or in terms of a thesaurus-type structure. This gives members an updated
model expressed in terms coded to different levels of "provisionality".
Members can then reconsider their views and proceed from Operation 6 above.
It is clear that the above operations permit a quite extensive degree of "de-committeefication".
Members of a modeling body can individually register their views and preferences
by post on each concept in the model and in their own time. The resulting lists
are circulated and amended to firm up progressively the consensus on each point
until final agreement can be reached. Alternatively, if this is a final difference
of opinion, then this can be registered as such. Actual discussion need only
take place when the accumulation of cases (which cannot be handled by correspondence
and a "modeling bulletin" mechanism) merits such contact.
C. Subsequent Phases
A number of phases can be envisaged which follow on from those detailed in "A"
above. They do not, however, modify the basic operations noted
1. File Movement: One of the disadvantages of isolated registration
points is that concepts common to two or more constituencies will not necessarily
be juxta-positioned. In particular, unless each such point is allocated a block
of sequence numbers, there are liable to be overlapping sequence numbering systems
which would jeopardize the whole project.
One means of avoiding this, aside from allocating blocks of numbers to each
registering point, is to circulate copies of the files between registration
points. (Either the tapes themselves could be moved or data links could be used.)
This might be considered a standard procedure by which duplicates in all newly-coded
concepts could be located and grouped together for consideration by each of
the interested modeling bodies, prior to arriving at a "final" decision.
The circulation of such information can be made very rapid. A courier file
can be circulated between the registration points for a particular discipline.
Information is copied onto and off each such sub-speciality. At one point in
its movement, such an intra-discipline file could interact with an inter-discipline
file (e.g. for disciplines in the same group) to permit a similar two-way transfer
to take place. Similarly a higher level courier file moving between groups of
disciplines could permit further exchange.
In this way cross-disciplinary confusion could be avoided. Clearly refinements
are possible by using mission-oriented tiles or geographical area files. The
system is very flexible. It could even be made to interact with "classified"
files by using security, subject matter and evaluation filters to govern the
The key feature is that it does not require more than a bare minimum of overall
organization or funding. It can be extended very loosely in response to the
initiative of any highly specialized discipline. Registration points are created
wherever (in terms of subject, jurisdiction or geographical level) there is
sufficient common interest -- i.e. motivation plus resources. This gets around
the current situation in which vain attempts are made to get significant funding
for general multi-purpose projects, particularly via any international program.
If cross-jurisdictional problems arise in particular areas, all the administrative
work there may be delegated under contract to some party judged to be impartial
and uninvolved - a commercial computer service bureau, a university, a
government agency, or a user cooperative point might be organized.
The costs involved at each collecting point are:
(a) conversion off information and queries to machine-readable form
(b) processing and output relevant to immediate user contacts
(c) transport costs of the courier file to the next collecting point.
The funds are expended locally in a manner which can be immediately justified
and yet this results in making available current information from points very
conceptually distant within the system.
D. Accredited Sources
It is clearly an advantage to allocate responsibility for modeling group activity
in a particular domain to the appropriate international professional organization.
The difficulty arises in determining which sources of information should be
recognized by such modeling groups. In the earlier phases when the group is
working through the standard texts, few problems should arise. But once a model
is available for inspection, problems will arise in determining whose suggestions
for additions or amendments should be accepted. Within a well-defined profession
this difficulty may be avoided by recognizing only accredited members of the
profession. The right to submit amendments then becomes a right accorded by
the profession. This procedure will undoubtedly lead to conflict when areas
common to a number of disciplines are considered (e.g. the social sciences,
in general), unless each discipline is restricted to its own model.
A distinction should also be made between the right to file an entity and the
right to suggest amendments to the model. There is some advantage in giving
wider access for filing, but limiting the "retention period" of the
entity filed according to the professional standing of the filer.
A later development could be the possibility of retaining entities only if
a supporting "vote" was registered by an appropriate number of appropriately
accredited persons. The degree of support could be a real time measure of the
degree of significance to the discipline of a given theoretical formulation.
Whatever procedure is adopted, it is essential, for the vitality and general
relevance of the project, that a wide range of people and organizations should
be in a position to add entities to the file - given a few simple safeguards.
In this way the interests of every relevant discipline, school of thought, problem
area, "approach" or paradigm should be protected. The system would
therefore be "open" to social scientists writing in any language or
taking any epistemological or ideological position.
Classification and modeling
1. Nature of classification
There is a considerable terminological variation in the scientific literature
that characterizes the use of the term "classification". Dalenius
and Frank, after making this observation (3) define the term as follows:
"Consider a collective of objects of some kind and a set of mutually
disjoint classes. Every object belongs to one, and only one, of these classes.
By classification we will denote the act of assigning the objects into these
classes. In taxonomy, classification indicates the act of creating classes
according to some principle, the term "identification" is used for
classification as used in this paper. By the same token, the term "coding"
is rather ambiguous. We refrain from its use here, but mention that classification
as used in this paper is referred to as coding in the literature dealing with
e.g. population censuses."
This definition, whilst appearing to be inclusive, in fact only covers one
type of classification, namely hierarchic classification where classes are mutually
disjoint. Classification of theoretical formulations is one area in which classes
may or may not be ritually disjoint.
J.H. Shera has made an excellent general assessment of the problems of general
library classification in an article of his, originally published in 1953 and
reprinted in his book, Libraries and the Organization of Knowledge. He
concludes that the hierarchical form in itself is not a sufficient basis for
the classification of knowledge and that what is required is a directed graph,
or non-hierarchic representation.
The relationship between hierarchic and non-hierarchic classification
schemes has been the subject of considerable work by Jardine and Sibson (4).
They are particularly interested in the stability of the classification produced
by a given method as the amount of information (or number of attributes) is
increased for the entities being classified. They are looking for measures of
distortion introduced by the imposition of a given classification scheme.
This work makes it clear that the process of classification can introduce distortion
and that this ran be avoided by using a directed graph representation. In this
project the distinction is made between the filing process, the classification
process, and the term allocation process.
It is useful to think of the first stage of the classification process as one
of "relationship indication", in which the relationships of a given
theoretical entity with other entities are inserted. This results in a "directed
graph" network of entities which can be searched by computer, particularly
to detect clusters with certain properties. This stage corresponds to the determination
of similarity or dissimilarity between entities.
In a second stage, the above network can be distorted so that its elements
can be fitted into a chosen set of classes with a certain relationship to one
another. This is "classification" as opposed to the previous phase
which inserts relationships irrespective of any class boundaries. It is convenient
to call this activity "modeling". Clearly the modeling activity is
a valuable preliminary to "classification". It is particularly valuable
in that once completed, different systems of classification can be compared
using the entities inter-related by the model, i.e. different degrees of distortion
can be imposed upon the network of entities according to the immediate needs
of the user. It may be useful to think of modeling in this context as a long-term
multi-person, whereas a given classification can be selected from the modeled
entities in terms of short-term, need-oriented considerations which permit certain
relationships in the network to be considered as "irrelevant" -- permitting
the isolation of simple, possibly hierarchic, classification schemes. In some
cases, it may however be preferred not to distinguish modeling from classification
and to blur the two operations into one another.
2. Filing and Classification
In the case of document indexing application, no distinction is made between
filing and classification. Because of this, the administrative problem
of filing and the qualified expert problem of classification combine
to create severe problems.
The UNISIST Study (5) noted that little progress can yet be reported in the
way of indexing-at-source and that a serious limiting factor to any form of
cooperative indexing is the range of acceptability of the proposed indexes.
Even the all-embracing and widely used U.D.C. has adversaries. The Study also
noted that it is unlikely that the concept of a universal scheme will ever make
any practical sense in the realm of deep content analysis (p.46) The reasons
are the observed differences in the semantic basis of indexing languages which
are the consequence of well-founded differences in outlook and interests on
the part of a highly-diversified community of users.
All that can be looked for, according to the UNISIST Study, "is the existence
of semantic relations between the different lexical sets (be they called classifications,
lists of disciplines, thesauri, automatic dictionaries for converting natural
language into information language, etc). The study of these relations is the
subject of ongoing research on the "compatibility" of indexing vocabularies
.... the subject is now receiving much attention as an essential part of projects
aimed at establishing world-wide interconnections between information systems."
It would appear from this, that the distinction between the impracticalities
of classification and the practicalities of "relationship identification"
(i.e. modeling) is becoming established. But the filing or administrative aspect
of "entity capture" is now blurred into the modeling phase. There
is as yet no suggestion that work on "computability" would be considerably
facilitated if similar filing techniques were used prior to the activity
at the modeling level at which the "well-founded" theoretical differences
arise. Standardization is possible, but at a lower level consistent with user
requirements. Until this is realized the relationship between lexical sets cannot
be handled systematically by computer methods.
3. Advantages of Numerical Filing System
The three major advantages of a sequential, non-significant numbering system
for entities are:
- facilitation of administrative activity by removing the burden of requiring
that the file number receive the "imprimateur" of an overloaded
- preparation of the basis for a proper semantic analysis by avoiding "the
difficulty encountered in manipulating semantic reality without the assistance
of a corresponding concrete reality" (6) and permitting "semantic
facts to be treated independently of their formal (linguistic) supports"(6).
- admission of "artificial" theoretical entities (new concepts,
groupings of other concepts) for which no simple term exists or for which
a questionable neologism would have to be invented. This is difficult in the
case of term oriented systems.
Entities, relationships and models
1. Types of Entity Included
There is a very varied terminology currently in use to characterize theoretical
products. Gunnar Sjöblom notes the use of conceptual (analytical, theoretical)
frameworks, analytical schemes, paradigms, orientations, frameworks for inquiry,
theory-sketches, pre-theories, etc.(7) The same is true for the components of
the scientific process: problems, observations, empirical generalizations,
models, derived propositions, hypotheses, theories, etc. It is unlikely that
any immediate agreement could be achieved on a standard terminology, even if
this was in fact beneficial.
Each of the conceptual constructs represented by the above terms may be treated
as an "entity" which could be incorporated into a computer fable.
Once incorporated, efforts could be made to attach an appropriate distinguishing
code to them within the framework of a given model. It is highly probable,
for example, that under different models the same entity may be coded differently,
or alternatively that distinctions important within one model will be insignificant
in another (e.g. theory and model; hypothesis and proposition).
As a summary, the above entities are numbered below to facilitate discussion
on possible groups of entities:
- 1. theories
- 2. propositions
- 3. hypotheses
- 4. models
- 5. analyses
- 6. conceptual frameworks
- 7. analytical schemes
- 8. theory sketches
- 1. paradigms
- 2. viewpoints
- 3. schools of thought
- 1. assumptions
- 2. criteria
- 3. values
- 1. substantive
- 2. methodological
- 3. problem formulation
- 1. taxonomy
- 2. typology
- 3. classification
- 1. bodies of data
- 2. interpretations of data
- 3. observations
There is some advantage in a two-level coding here, because it might be possible
to arrive more easily at agreement on the more general level coding, even if
there are differences between models on the coding within that 1evel. There
is of course the possibility that within a particular model the grouping would
be done differently, in which case the coding scheme would be peculiar to that
2. Types of Relationship Included
It is not the intention of this project to set up a single rigid classification
of permissible relationships between entities. Just as no effort was made to
limit the types of entities that could be handled (see above), it should not
be necessary to make the futile attempt to resolve the intellectual problem
of how many types of relationship are significant. That the attempt would be
futile on the part of any one group is shown by Eric de Grolier's excellent
chapters on this expression of relationships in generalized and specialized
coding systems, in natural languages, and in experimental languages (8). He
concludes, in his UNESCO/FID supported study, that it proved impossible to produce
a systematization that was "sufficiently satisfactory to warrant even preliminary
This conclusion should not however lead to a decision to adopt some hypothetical
"best existing scheme" or to the formulation of a news scheme. It
should be recognized that the project should be capable of handling as many
different schemes as possible. In fact the evolution of knowledge is partly
represented by attempts to produce new schemes of relationship and categorization.
Without recommending any particular scheme, it is useful to attempt to list
out some of the relationships to give an idea of the variety that his been envisaged.
De Grolier suggested a clarification of the sign ":" in the UDC (rejected
by the FID Central Committee on Classification for the UDC) which covered the
1.1 Appurtenance (belonging)
- 11 Inclusion, implication
- 12 Parts, organs
- 13 Components, constituents
- 14 Properties, attributes
- 141 " " physical
- 142 " " chemical
- 143 " " biological
- 15 Aptitudes, predispositions
- 21 Action: acting on (subject), affected by (object)
- 211 Favourable (stimulation; increase)
- 212 Unfavourable
- 2121 Delay
- 2122 Inhibition
- 2123 Destruction
- 21 Interaction
- 211 Favourable (symbiosis)
- 212 Unfavourable (antagonism, competition)
- 22 Operation, means used: process (subject), product, result (object)
- 3 Causality, origin, etc.
- 31 Causality; cause (subject), effect (object)
- 32 Origin: originating (subject), arising from (object)
- 33 Conditioning, requirement: conditioning (subject), conditioned (object)
- 3 Interdependence
- 31 Correlation
- 32 Association
- 33 Combination, synthesis
- 41 Aspect, particular case
- 42 Application
- 43 Use
51 Resemblance, likeness, similarity
- 511 Analogy
- 512 Equality, identity
- 52 Dissimilarity, unlikeness
- 521 Difference
- 522 Opposition (of character)
Other typologies of relationships have been formulated by Gardin, Farradane,
Perry and Kent, Juilland. Each uses very different and, at least superficially,
unrelated categories. The different models envisaged in the next section encompass
compositional, behavioural, didactic, historical, cybernetic and problem-oriented
3. Types of Model
It is important to keep in mind the many possible uses of the proposed computer-based
filing system. Concentration on one set of uses may not necessarily keep the
system alive either in terms of funding or value to current research activity.
Multiple demands on it would ensure multiplicity of fund sources and many bodies
willing to feed in entities and assist in different aspects of the coding.
The following types of model are an illustration of the possible lines of development.
The list does not pretend to be exclusive so that other kinds of model could
be included. An attempt has been made to group, the models into types which
in some cases might usefully be treated on the same occasion by the responsible
It is important to note that the models are not only simple hierarchies but
can also be networks of relationships in cases where categories overlap or one
entity can be a component of several other entities.
Group 1: Current Structures
This is a poor title but refers to all the current and new structures and relationships
as made up of:
1.1 Compositional Models: These models would be primarily concerned
with the manner in which entities are nested within one another to form hierarchies.
Six types of relationship are possible here in three sets of two.
a) Meta-level: reference numbers of all entities of which this entity is
(This relationship could be split into two sub-types as the computer-level
data formats for other types of model require such a split.)
Examples are: theories in which this concept is used, general class of concepts
to which this concept belongs, general problems of which this problem is a
part, organizations of which this organizational unit is a member.
b) Sub-level: reference numbers of all entities which are components of this
entity. (This relationship could be split into two sub-types for the same
reasons as above.)
Examples are: concepts used in this theory, concepts which belong to this
class of concepts, properties or attributes of this concept, sub-problems
of this problem, organizational units which are members of this organization,
c) Associated reference numbers of all relevant entries which have a horizontal
relationship to this entity.
- See - also entities, namely those which should also be borne in mind when
considering this entity.
Examples are: cases of insufficient terminological precision.
- Use - instead entities, namely those which should be substituted for this
Examples are: cases where an entity is outmoded for that model.
An interesting map of relationships between conceptual entities is given in
Figure 1. This shows the interlocking and meeting of concepts associated with
measurement of simple physical phenomena.
1.2 Behavioural Models: At the same time that the modeling activity
is undertaken on the compositional relationship in 1a, it should be useful to
consider some non-compositional relationships to other entities. In other words,
the effects of the presence of one conceptual entity on another in the "ecosystem
of ideas"(9). By this is meant concepts which are indirectly undermined
or strengthened by the validity of this concept, organizations whose monopoly
is weakened by the presence of this organization.
Group 2: Contextual Structures
Again this is a poor title but refers to the historical and comprehensional relationships
which constitute a context for the Group 1 current situation, and would be used
in learning about the Group 1 situation.
2.1 Educational Models: These models would be produced by those modeling
groups primarily concerned with education and raking more sophisticated concepts
2.2 Historical Models: These models would be produced by those modeling
groups interested either in historical research on the history of ideas or in
providing an historical framework to assist education. It is probable that the
educational and historical models should be considered together, which is why
they have been grouped.
Group 3: Real World Systemic Relationships
The previous groups of models deal with the relationship between conceptual
entities in anthropocentric terms or within the logic of particular disciplines.
It is also useful to consider the systemic effects of real world entities on
one another. This produces another pattern of relationships between the entities
The best example of this distinction is the inter-disciplinary nature of environmental
problems, when for example, it is the real world interaction of chemicals in
food chains which cause egg shells to become thin -- leading to high chick mortality
rate of some bird species. For a social example, the relationship shown between
the entities, represented by boxes in figure 2, give a schematic representation
of the factors binding a Canadian Indian to a pattern of problems.
Group 4: Term-oriented Models
In some cases where classification is rudimentary or non-existent, the emphasis
is placed immediately on the terms. This is the case when:
- official terms are used and the definitions are conventional or undefined
as in many library or descriptor lists. The entity is defined by the term.
- a particular official definition exists for a particular term as in official
dictionaries (e.g. the Larousse Litré as reflecting the decisions of the Academie
- terms are related in a thesaurus without definitions (e.g. as in Roget's
Thesaurus). Such thesauri may have many levels of classification.
There is no reason why each such set of terms should not be treated as a model
as in the other groups. Where appropriate, the classification code position
would be omitted and only the term positions used.
Group 5: Administrative Models
The assumption made in discussing the earlier groups of models was that the
model was in some way a definitive structure on which new work would build.
It is however possible to use the model building code to facilitate the administrative
work on the definitive model.
Group 6: Mission-oriented Models
An assumption made in earlier groups is that the modeling bodies would all
be discipline-oriented. There is however no reason why mission-oriented models
should not be used where appropriate (e.g. in connection with development, environmental
Group 7: Interdisciplinary Models
Clearly it is most important to avoid a "babble of models". A second
level operation of model reconciliation to form a set of interdisciplinary or
inter-model models could therefore be instituted when required.
These could either (i) be constructed (automatically by computer) from all
the entities common to the models from which it is desired to produce an inter-disciplinary
model, or (ii) be constructed by selection based on judgement of the best from
Group 8: Future-oriented Models
A final assumption made in dealing with the earlier groups was that only the
current or historical situations would be modeled. There is however no reason
why speculative models should not be produced showing the relationships between
entities at different points in the future. The modeling activity might then
in some ways represent the Delphi method of forecasting.
Group 9: Personal Models
Perhaps a long term ideal is for a person to be able to "look at"
(or interfere) with the basic list of entities in terms of his own model which
is his personal "thought file". Each new idea he gets could be usefully
reflected in the structure of this file.
Group 10: Sub-models
In some cases a particular sub-branch of knowledge may be fragmented by reinterpretation,
reconceptualization and redefinition of the same entities. It is then appropriate
to use a "sub-modeling" strategy. In other words, instead of requiring
"dissident" groups to conform or to divert their energies into a parallel
model with differences in a minor area, a sub-model could be used to redefine
that area in the dissident group's terms. The sub-model would therefore offer
an alternative interpretation.
Group 11: Languages as sub-models
It may be convenient, for some purposes, to consider the relationships between
theoretical formulations used in a particular language as a sub-model. The differences
between the concepts encountered in Indo-European languages are relatively minor,
so that term equivalents pose no great problems, but should it be necessary
to enrich the system by incorporating theoretical formulations from other language
groups problems could arise.
Data to be included on each entity
A. Concept Filing Phase (Identification or Registration)
1. Entity Sequence Number (10): Each new conceptual entity, of
whatever type (see earlier section), receives a unique number which is the next
available in a sequential list. The number therefore contains no significant
digits or codes and has no meaning for classification purposes. (It may be an
advantage to use the check digit technique.)
For practical purposes, it may be convenient to pre-allocate blocks of numbers
to different filing centres whenever required. This avoids problems of duplication
and speeds up administration. Where duplication does occur, this is eliminated
at the modeling stage.
One advantage of this sequence number as a concept identifier is that it is
not necessary to file a definition or conventional term at the same time. This
is convenient if a new theoretical formulation has been tentatively conceived
with known relationships to other concepts but with no clear definition or label
yet. It avoids the need to coin doubtful neologisms in order to register the
concept. in some cases it may even be an advantage to leave the term defined
by its context of relationships, and not to bother attempting to find a suitable
term. In which case the sequence number would be used as the only identifier
until a suitable terminology for concepts in that domain can be elaborated more
2. Model Description
2.1 Model Number: The act of filing an entity is distinct from
the later modeling activity. The "model number" in this case is
"0". This artifice permits the definitions and the conventional
terms or labels in different languages to be handled within the computer record
framework together with the modeling and term allocation activity.
2.2 Sub-model Number (see model Group 10): Again, since entity
filing is distinct from the later modeling activity, this zone is "free".
It is therefore used to distinguish between:
- entity definitions ( for which it is "0")
- entity conventional labels or terms (for which it is "1")
2.3 Language (see model Group 11): Since the definitions or
the label may be given in several languages a language code is used, (e.g.
English "1", French "2", etc.).
2.4 Alternatives: There are bound to be cases, for a given
language, in which alternatively worded definitions (with the same meaning)
are put forward. Similarly, where several conventional terms or labels referring
to the same entity exist, these may also have to be filed. A simple
sequential code ("1", "2", etc.) is therefore used to
distinguish between successive alternatives.
3. Cross-reference: Cross-references are used during the modeling phase
so that this zone is "free". It is, however, used in this phase to
identify the sequence number of :
- other entities which use the same conventional labels as this entity (i.e.
where the same label is used with a different meaning)
- other entities which are defined using the same verbal definition (but
for which the definition has a different meaning). This may be a low-frequency
or trivial case.
4. Source Code: There are several possible ways of handling information
about the source of information on the entity.
4.1 Ignore: In a simplified system it is not necessary to include
it since such information can be found in a backup card file.
4.2 Abbreviate: Some general code, indicating the country, the publication
or the filing group can be used.
4.3 Name: The name of the person, or filing organization, may be given
in some abridged form (e.g. "DEUTKW" for Karl W. Deutsch).
4.4 Name and Support: In a more elaborate system, in which members
of a discipline are expected to indicate any strong "support" or
"opposition" to any new theoretical formulation, a "voting"
technique may be envisaged (see page ). This option could be confined to the
"elders" of the profession -- or left open to all members of a profession.
As "professional" activity, this might be restricted to the modeling
A given member of the profession, if sufficiently aroused, could then file
his support or opposition in the form "DEUTKW +" or "DEUTKW
4.5 Name and Reference: It might be thought more valuable to give
not only the name but the reference to the document in which the theoretical
formulation is discussed and justified. On the question of abbreviations to
document reference, one is immediately in the jungle of dispute amongst librarians,
documentalists, etc. Several possibilities exist.
4.5.1 Use an extended bibliographical "standard" reference.
This uses a lot of space and is mainly pleasing to librarians.
4.5.2 Use an abbreviated reference as in "Science Citation
Index" (e.g. the first four letters of the first two significant words
of the title, plus the year date, issue or volume number within which pagination
is consecutive, and the first significant page number -- "DEUT KW --
NERV GOVE -- 1963 -- O -- 192").
4.5.3 Use a sequence number code. To avoid getting bogged down in
documentation problems, a simple sequence number could be used for each
either: i) referred to by the system (e.g. a complete sequence across all
or: ii) referred to by the system for a given author (e.g. starting from
zero for each new author).
A parallel "documentation" system would be required to decode
the codes used in the approach but it might prove much tidier and practical
in the long run (e.g. "DEUTKW 509") (11). The precise page numbers
might be an additional requirement (e.g. "DEUTKW 509-192"). Again,
as a "professional" activity, this might form part of the modeling
5. Model Descriptor: This is not used during this phase.
6. Relationship Descriptor: This is not used during this phase.
7. Date Codes
7.1 Date first used: The date on which a theoretical
formulation was first used is inserted here. If this is not supplied, the
computer can automatically insert the date on which the entity was filed.
7.2 Date last used: This date is supplied as a result
of general consensus by all modeling groups and is therefore not dealt with
during this phase.
7.3 Retention period: It may be an advantage in this
phase to tag some entities of unknown value so that they will automatically
be dropped from the system after a certain period unless some contrary instruction
is received in the meantime. Different retention periods can be used according
to the status of the source.
8. Status: For administrative purposes it is convenient to have
a zone in which codes may be used to indicate that the entity is "under
consideration", "of doubtful value", "no longer used",
9. Text: The words or text used for:
- the conventional terms or labels
- the definitions
would be inserted into this zone. This zone could also be used for any special
comments which might be usefully added.
B. Concept Coding Phase (Modeling or Classification)
Many of the zones discussed above are used in this phase but for a different
purpose or in order to establish computer records distinct from those created
during the earlier phase or by other modeling groups.
1. Entity Sequence Number: This is repeated for each new relationship
established within a model and is of course the same as that used in filing
the identity in A.1
2. Model Descriptor
2.1 Model number: As discussed elsewhere (see page
), each modeling group receives a unique number (e.g. "362") which
identifies the system of relationships which are elaborated and filed, while
at the same time distinguishing it (at computer level) from any other systems.
There is some argument for attaching special significance to particular digits
of the model number with a view to clarifying a hierarchy of models or, at
least, showing a relationship between models. In other words, at this level
a U.D.C.-type approach might be used so that "political science"
models are all identified by "32N" and "anthropology"
models by "39N". This is probably a temptation to be resisted however,
since it has some theoretical implications which are better contained within
models. In which case a simple sequential list should be established from
which the next available model number could be taken.
2.2 Sub-model number: This is a zone to be used by a modeling group
whenever a level of dissent is encountered so that alternative sub-models
within the general model can be satisfactorily handled and identified.
Normally, in the absence of sub-models, this would be "0".
2.3 Language: Since the relationship between concepts is supposedly
language independent, this zone should normally be "0". There are,
however, cases where relationships are identifiable in one language but absent,
ridiculous, or ambiguous in a second. In such cases it may be convenient to
use this zone for a form of language-dependent sub-model.
2.4 Alternatives: This zone is not used in this phase and must
be "0" (to permit identification of the term records in the next
phase where it is non-zero).
3. Cross-references: This zone supplies the main means by which
the relationship of this entity to other entities is indicated for the particular
model indicated in 2.1. The sequence number of the other entity is indicated
here. In effect, every such "relationship" gives rise to a new computer
record (see figure 3). The type of relationship is either implicit, because
of the model used, or is described in 6 and 7.
4. Source Code: Depending on the method chosen (see A.4.1,
A.4.2, A.4.3, A.4.4, and A.4.5), the source coding would probably either be
allocated during the concept filing phase with nothing in this phase, or in
this phase with nothing in the previous phase. In the most sophisticated system,
it might however be desirable to give:
- source coding for the entity in the concept filing phase
- source coding for individual relationships within a model, during the modeling
Source coding during the modeling phase might be particularly helpful in the
administrative work of elaborating a model, since it permits members of a modeling
group, working independently and in isolation, to "vote" on the insertion
or deletion of particular relationships (see A.4.5). Such a postal vote system
would be particularly helpful in clarifying with precision just what was under
discussion at any point in time.
5. Model Descriptor: This zone is used to indicate which model is
to be considered at the entity cross-references in 3. In a simplified system
this zone would not be required because the assumption would be made that each
model was totally isolated from other models.
In a more sophisticated system however, there is need for a means of expressing
relationships between parts of models. For example, it may be that in a certain
domain two models are identical or that one forms a subset of the other. In
such a case there is little need to duplicate all the relationships in the second
model, provided cross-reference between the models is possible.
5.1 Model number: As for 2.1, but the model is only to be entered
at the entity in which the cross-reference in 3 refers.
5.2 Sub-model number: As for 2.2, but again is only to be entered
at the entity to which the cross-reference in 3 refers.
5.3 Language: As for 2.3, but again is only to be entered at the entity
to which the cross-reference in 3 refers.
5.4 Alternatives: Not used. (This zone may even be omitted entirely.)
6. Relationship Descriptor: This zone is used to describe the
relationship constituted by the link between this entity and that cross-referenced
in 3. Two basic types of relationship descriptors may be distinguished.
6.1 Relationship descriptor A: This is used to give an indication
of the relative levels of the two entities related (e.g. class and
member), directions for flow (e.g. from or to), etc. These are used,
for example, to indicate any hierarchical relationships. These codes and the
cross-reference in 3 are all that is required for a graph-theoretical analysis
of the network of concepts. It is here that any "see other" code
would be inserted.
It is also important to indicate the type of relationship between two entities
(see page"..) :
- logical (i.e. B includes A, etc.)
- consistency (contradiction/support)
- time (precedes/follows)
- cybernetic (information exchange)
- responsibility (flow of decisions]
This is an indication of what is flowing or the nature of the relationship.
It does not seem feasible to predetermine the possible types of relationship
which might be required. The technique which can be adopted is therefore to
use a simple numeric code - the next available in a sequential list - for
each new type of relationship with which a modeling group wishes to
The arrangement in this zone could be left up to the modeling group. It
is desirable that standard codes should be developed to facilitate graph-theoretical
analyses and that a standard code system should be used to denote types of
relationships (e.g. "321" where the numbers have no special significance).
6.2 Relationship descriptor B: This is used for evaluation
descriptors. In other words the codes used here supply some form of ranking
to the relationship described in 6.1 (e.g. some measure of relative importance
(within the model), some measure of degree of relativity, etc.)
It is in this zone that the degree of consensus on the characterization of
the concept by the discipline could be coded. The zone may even be used to
carry quantitative information on the size of flow represented by the relationship
and also its periodicity, if relevant.
Again, the arrangement of this zone could be left up to the modeling group.
It is however desirable that a standard form should be developed -- even if
exceptions to it are frequent.
7. Date codes
7.1 Date first used: This may be used to indicate the date each relationship
between entities was first noted, or alternatively the computer can automatically
insert the date on which the relationship vitas first filed.
7.2 Date last used: This date may be used when the relationship is finally
rejected as invalid or unacceptable.
7.3 Retention period: This zone may be used by members of a modeling
group to communicate with one another. A member may submit "trial balloon"
relationships, with' a very short (one-cycle) retention period so that others
can "see how it looks". Once agreed, the retention period can be set
so that relationships periodically come up for review.
8. Status code: For modeling group administration purposes it is convenient
to have a zone which may be used to indicate that the relationship is "under
consideration", "a tentative proposal", "a firm proposal",
"agreed by the group", "required priority attention", etc.
9. Text: Normally a relationship record should require no text.
There is however no reason clay this zone should not be Cased for any text comments
on relationships which may seem significant to the modeling group.
C. Term Allocation Phase
1. Entity sequence number: Required as before.
2. Model descriptor A:
2.1 Model number: Required as before. A term can only be authoritatively
allocated within the modeling group. It is utopian to expect that consensus
can be consistently achieved between modeling groups on a unique authoritative
term for the entity to which they all refer in their different ways.
2.2 Sub-model number: This should normally be zero, since it will
probably be easier to achieve consensus on a term between model and sub-model
than between model and model.
2.3 Language: Required as before for each language version of the
2.4 Alternatives: This must be "1" or greater to distinguish
the term records from the relationship records. If alternative authoritative
terms are required in a given language the zone would be used to distinguish
3. Cross-reference: Normally this would be "0". It
may however be necessary to indicate other entities using the same term
(but obviously with a different meaning).
4. Source code: There may be some cases where it is important
to indicate the document in which the justification for the unique authoritative
term is urged.
5. Model descriptor B: May be required if the cross-reference
to a use of the same term in a different model is needed.
6. Relationship descriptor: Not required.
7. Date codes
7.1 Date first used: This may be used to indicate the date the term
was first used, or alternatively the computer can automatically insert the
date on which the term was first filed.
7.2 Date last used: Terms fall from favour. The last date of use can
be indicated here.
7.3 Retention period: May be used as in B.7.3.
8. Status code: May be used as in B.8
9. Text: The words used in the authoritative term are inserted into
this zone. Alternatively, the equivalent decimal coding could be inserted, if
Limitation of scope and sources of concepts
1. Scope: The design of the system is sufficiently general that
it could be used to order theoretical formulations in any area of knowledge.
Such broad coverage would clearly be impracticable, and probably even undesirable,
in the foreseeable future.
It is useful to re-emphasize that the proposal is not
concerned with the areas covered by social science documentation as there are
many such documentation projects. The UNISIST report mentions the parallel programs
proposed by such bodies as the International Council of Social Sciences and
the International Committee for Social Sciences Documentation. There are numerous
equivalent projects at the national level. The object here is to concentrate
on theoretical formulations which may or may not be mentioned in a given collection
The priorities proposed would be based on three dimensions:
- commencing with the more abstract formulations and then moving to the more
specific or concrete
- commencing with formulations of interest to several social science disciplines
and then moving to those common to several schools of thought, and finally
to those current within one school of thought only. (The suggestion is that
an effort should be made to elaborate the significance of "inter-",
"multi-" or "trans-disciplinary" concepts as a priority
area of study with respect to knowledge analogous to the focus on international
relations as opposed to national level activities. The degree of interdisciplinarity
of a concept is a valuable means of determining priorities (12).
- commencing with theoretical formulations before going on eventually to
methods and supporting data
This does not of course preclude any modeling group from concentrating
solely on the formulations of its own school of thought. The main concern however
should be to ensure that the system reflects the general framework of theoretical
formulations. Highly specialized formulations should not clutter up the modeling
activity. Little effort should be made to include minutiae about particular
social entities which have not been reflected in more general formulations --
unless such minutiae represent unique evidence of the need for new formulations.
The system should be compact and easy to use rather than large and unwieldy
as are most documentation systems.
2. Sources: Guidance in limiting scope can be obtained by concentrating
in the light of the above priorities on concepts mentioned in such publications
2.1 David L. Sills (Ed.), International Encyclopedia of the Social
Sciences, Macmillan, 1968.
2.2 Julius Gould and W.L. Kolb, Ed.), A Dictionary of the Social
Sciences (compiled under the auspices of Unesco) Free Press, 1964.
2.3 UNESCO. Main Trends of Research in the Social and Human Sciences.
Paris, Unesco, (Part one: social sciences, 1970, 819 p.; Part two: human sciences,
1972). Also in French edition.
2.4 International Committee for Social Sciences Documentation. International
bibliography of the social sciences. Tavistock, 4 annual volumes (sociology,
political science, economics, social and cultural anthropology).
2.5 Key textbooks in each discipline.
2.6 Specialized multi-lingual dictionaries and glossaries, such
- Gunter Haenich. Dictionary of International relations and politics;
systematic and alphabetical in four languages [German/English/French/Spanish).
Elsevier, 1965. This dictionary has 5778 terms with equivalents in the
- I. Paenson. English/French/Spanish/Russian Systematic Glossary of Select
Economic and Social Terms. Oxford, Pergammon, 1964. Attempts to present
a system of inter-related concepts which reflect a vertical hierarchy and
are presented within a continuous text in a systematic exposition of a given
2.7 Institute for Scientific Information. Social Sciences Citation
Index. Philadelphia, (included in the SSCI are three separate but related indexes
of different periodicity covering the literature of the specified calendar
year. Price in 1973: $1,250(sic) ).
2.8 International social science organisations. A preliminary
count indicates that possibly some 30 such bodies could contribute in some way
to the project.
Concept notation in documents
It has been stressed that this project does not require a complex notation
system since each concept is represented by a single sequence number, plus an
indication of the model number in question, if required. Nevertheless, since
one object of this approach is to permit scholars to refer with precision to
a particular concept in their papers, a standard method of indicating such a
concept in print is required.
A similar problem arises in the natural sciences in distinguishing between
different isotopes of the same atom (i.e. cases where slightly different versions
of the same atom exist due to differences in atomic weight), where the same
symbol does not distinguish between isotopes. The solution adopted is to indicate
the atomic weight as a superscript to the standard symbol.
In the case of concepts, represented in print by the same word, one solution
would be to use the sequence number of the concept as superscript to the word:
To avoid confusion with bibliographical references, the number could perhaps
be preceded by an asterisk.
There is a strong temptation to adopt a technique for uniquely identifying
concepts similar to that of the International Standard Book Numbering (ISBN)
system now used ton the reverse of all recent book title pages) to give a unique
code to each book. This number consists of 10 digits made up of the following
The total length is 10 digits, but the three identifiers
only total 9 digits. In order to avoid wastage of numbers or lack of
sufficient numbers, publishers with a large book output (of which there are
few) have a two or three digit identifier so that the title identifiers can
use six or five digits. A small publisher (of which there are many) has a five
or six digit identifier so that the title identifier can case two or three digits.
The publisher identifier is therefore selected on the basis of his output using
from two try six digits as required. Hyphen separators are used.
- group identifiers (i.e., national, geographical, language or other
convenient group). An "agency', coordinates the allocation of numbers
within each group e.g., one for Anglo-American publications ("a"),
one for UN system publications, etc. The group identifier is allocated by
an international standard book numbering agency (in formation). (This could
be considered as a concept filing centre identifier allocated by some
loose coordinating body.)
- book publisher identifiers. The publisher identifier is allocated
internally within the group by the group agency. (This could be considered
as an accredited concept filing source identifier allocated with respect
to the filing centre for which it locates Few conceptual entities)
- book title identifiers. A block of sequence numbers is reserved
for each publisher to permit him to select the next available for the next
hook. This could be considered as a block of sequence numbers for concepts,
so that each accredited source can select tile next number as each new concept
- check digit. This ensures that the code has been correctly transcribed
and input to the computer. A computer pre-generated list of "available"
sequence numbers incorporates this digit (which is calculated on a modular
11 with weights 10-2, using X in lieu of 10 where 10 would occur as a check
The temptation to use this system should however be resisted. While the significance
attached to the digits is only "administrative" and has no "theoretical"
implications, problems of overflowing the allocated blocks are bound to occur.
The system will "bulge" in unpredictable areas as the U.D.C. has done.
It is also questionable whether so much significance should be placed on the
source which, ones the concept has been incorporated, will quickly become irrelevant
within the network of other related concert. from other sources.