1973
Project Features toward a Concept Inventory- / - Originally appeared in 1973 as part of Toward a Concept Inventory Organization of projectThe success of a project of this type would be dependent upon the extent to which any central organisation can be avoided in favour of a process of catalysis. There is too much to be done to run the risk of the usual jurisdictional, behavioural, and personality problems associated with a centralised organisation. Such problems rapidly alienate potential support. The problem is therefore to bring into existence a decentralised network of groups working on different aspects of the project, but able to exchange the results of their activities without difficulty. (It is important to remember that it is probably impossible to "organise" a whole area of knowledge because the latter is well subdivided into territories and "stamping grounds" whose incumbents are reasonably content with the current situation. It may, however, be possible to offer them a reasonably neutral device by which they can each facilitate and order their own particular approach, and, as a by-product, see more clearly its relationship to that in other "neighbouring" territories. Having by this means obtained a decentralised picture of the current situation, it is then possible, in a totally distinct process, to lobby the incumbents into participating to some degree in inter-territory efforts at organising areas of knowledge whilst guaranteeing safeguards for the protection of their "sovereignty".) A. Launching Phases A number of phases can be envisaged, some of which could overlap. 1. Investigation: During this Phase the project would be investigated in detail by circulating proposals among appropriate specialists. The main object would be to ensure that the proposal is oriented in the right direction, and that funds for pilot projects are obtained. This phase may be considered to be underway already, through the actions of the COCTA committee. 2. Pilot Projects:. During this phase efforts would be devoted to the following areas:3. Agreement on Standard Formats: On the basis of the previous phase, standard formats for filing new formulations and for holding them on magnetic media would be agreed. Since this is a new type of project, it should not encounter the apparently insurmountable difficulties of those concerned with organizing the computerized exchange of bibliographical information. 4. Production of Standard Software: Once agreement has been reached, a standard software computer program can be made available to all those bodies which wish to initiate some concept modeling activity, to act as a central filing point for their particular constituency. It is possible that initially only one body will be active, possibly as an extension of the pilot project stage. 5. Filing Procedure: Once a standard filing or registration form is developed, there should be no difficulty for any group in receiving and filing identified concepts. This can of course be done by mail. By filing is meant the purely administrative activity of preparing the forms for the computer. There should be a minimum of judgmental effort at this stage, and none with respect to the theoretical problems of the subsequent modeling activity. The object is to get the incoming information into a form which facilitates the activities of the members of the modeling bodies. The area of difficulty which does require examination is that of how to decide who should not be permitted to submit concepts for filing into the common data base. This point is considered below. B. Periodic Operations 1. Lists of Formulations: Periodically the sequence of identified concepts held on magnetic tape should be scanned to produce lists for circulation to the modeling bodies and, if required, their members. Two types of lists can be envisaged.
2. Modeling or classification: The lists derived from the previous operation can be examined by the modeling bodies in committee or distributed by post to their members. From these (postal) deliberations should emerge a collective opinion on the place within the classification scheme, of each identified concept reviewed. If necessary, a provisional" view can be formulated by the use of appropriate coding. In fact this might be a most useful way of submitting a committee's view for wider consideration. Different degrees of "definitiveness" could thus be envisaged. 3. Feedback of Model Information: The details of the place of the concept within a particular model would be indicated on a standard form which could be returned by post for keypunching and incorporation. A modification of this approach would be to permit individual committee members to each return forms for any new entity under consideration. In this way all the alternatives would be incorporated into the model with some "provisional" code so that each member could see the proposals of the others, and their implications. In some cases, this could even be operated as a means of postal voting on the treatment of controversial concepts. The administrative load of the committee is in this way largely computerized. 4. Input of Model Information: The forms from each modeling body would be handled at the central registry point (for that constituency), keypunched and fed onto the magnetic tape file. Keypunching errors would be corrected there as far as possible. 5. Production of Model Amendment Lists: Whenever required, the concepts incorporated into a given model would be selected and sorted into the thesaurus-type structure appropriate to the model and listed for distribution back to the members of the modeling body. This gives members an updated model with all the concepts coded to different levels of "provisionality". Members can then reconsider their views and proceed from Operation 2 above or, alternatively, for those formulations which have been classified to the agreement of all concerned, the term allocation operation may be initiated. 6. Allocation of Model Terms: Working from the concepts structured into a thesaurus-type order, members can allocate terms to each entity in English and whatever other languages are considered necessary. Again, there is no reason why "provisional" coding should not be used to cover various working cycles of term allocation. 7. Feedback of Term Information: As with model information, the alphanumeric terms allocated to each concept can be indicated on a standard form which could be returned by post for keypunching and incorporation onto magnetic tape. 8. Input of Term Information: The forms from each body allocating terms within a model would be handled at a central registry point, as with the model information itself. 9. Production of Term Lists: Whenever required, the concepts incorporated into the model would be selected and sorted into term lists, either in alphanumeric order or in terms of a thesaurus-type structure. This gives members an updated model expressed in terms coded to different levels of "provisionality". Members can then reconsider their views and proceed from Operation 6 above. It is clear that the above operations permit a quite extensive degree of "de-committeefication". Members of a modeling body can individually register their views and preferences by post on each concept in the model and in their own time. The resulting lists are circulated and amended to firm up progressively the consensus on each point until final agreement can be reached. Alternatively, if this is a final difference of opinion, then this can be registered as such. Actual discussion need only take place when the accumulation of cases (which cannot be handled by correspondence and a "modeling bulletin" mechanism) merits such contact. C. Subsequent Phases A number of phases can be envisaged which follow on from those detailed in "A" above. They do not, however, modify the basic operations noted in D''. 1. File Movement: One of the disadvantages of isolated registration points is that concepts common to two or more constituencies will not necessarily be juxta-positioned. In particular, unless each such point is allocated a block of sequence numbers, there are liable to be overlapping sequence numbering systems which would jeopardize the whole project. One means of avoiding this, aside from allocating blocks of numbers to each registering point, is to circulate copies of the files between registration points. (Either the tapes themselves could be moved or data links could be used.) This might be considered a standard procedure by which duplicates in all newly-coded concepts could be located and grouped together for consideration by each of the interested modeling bodies, prior to arriving at a "final" decision. The circulation of such information can be made very rapid. A courier file can be circulated between the registration points for a particular discipline. Information is copied onto and off each such sub-speciality. At one point in its movement, such an intra-discipline file could interact with an inter-discipline file (e.g. for disciplines in the same group) to permit a similar two-way transfer to take place. Similarly a higher level courier file moving between groups of disciplines could permit further exchange. In this way cross-disciplinary confusion could be avoided. Clearly refinements are possible by using mission-oriented tiles or geographical area files. The system is very flexible. It could even be made to interact with "classified" files by using security, subject matter and evaluation filters to govern the interaction. The key feature is that it does not require more than a bare minimum of overall organization or funding. It can be extended very loosely in response to the initiative of any highly specialized discipline. Registration points are created wherever (in terms of subject, jurisdiction or geographical level) there is sufficient common interest -- i.e. motivation plus resources. This gets around the current situation in which vain attempts are made to get significant funding for general multi-purpose projects, particularly via any international program. If cross-jurisdictional problems arise in particular areas, all the administrative work there may be delegated under contract to some party judged to be impartial and uninvolved a commercial computer service bureau, a university, a government agency, or a user cooperative point might be organized. The costs involved at each collecting point are:
The funds are expended locally in a manner which can be immediately justified and yet this results in making available current information from points very conceptually distant within the system. D. Accredited Sources It is clearly an advantage to allocate responsibility for modeling group activity in a particular domain to the appropriate international professional organization. The difficulty arises in determining which sources of information should be recognized by such modeling groups. In the earlier phases when the group is working through the standard texts, few problems should arise. But once a model is available for inspection, problems will arise in determining whose suggestions for additions or amendments should be accepted. Within a well-defined profession this difficulty may be avoided by recognizing only accredited members of the profession. The right to submit amendments then becomes a right accorded by the profession. This procedure will undoubtedly lead to conflict when areas common to a number of disciplines are considered (e.g. the social sciences, in general), unless each discipline is restricted to its own model. A distinction should also be made between the right to file an entity and the right to suggest amendments to the model. There is some advantage in giving wider access for filing, but limiting the "retention period" of the entity filed according to the professional standing of the filer. A later development could be the possibility of retaining entities only if a supporting "vote" was registered by an appropriate number of appropriately accredited persons. The degree of support could be a real time measure of the degree of significance to the discipline of a given theoretical formulation. Whatever procedure is adopted, it is essential, for the vitality and general relevance of the project, that a wide range of people and organizations should be in a position to add entities to the file given a few simple safeguards. In this way the interests of every relevant discipline, school of thought, problem area, "approach" or paradigm should be protected. The system would therefore be "open" to social scientists writing in any language or taking any epistemological or ideological position. Classification and modeling1. Nature of classification There is a considerable terminological variation in the scientific literature that characterizes the use of the term "classification". Dalenius and Frank, after making this observation (3) define the term as follows:
This definition, whilst appearing to be inclusive, in fact only covers one type of classification, namely hierarchic classification where classes are mutually disjoint. Classification of theoretical formulations is one area in which classes may or may not be ritually disjoint. J.H. Shera has made an excellent general assessment of the problems of general library classification in an article of his, originally published in 1953 and reprinted in his book, Libraries and the Organization of Knowledge. He concludes that the hierarchical form in itself is not a sufficient basis for the classification of knowledge and that what is required is a directed graph, or non-hierarchic representation. The relationship between hierarchic and non-hierarchic classification schemes has been the subject of considerable work by Jardine and Sibson (4). They are particularly interested in the stability of the classification produced by a given method as the amount of information (or number of attributes) is increased for the entities being classified. They are looking for measures of distortion introduced by the imposition of a given classification scheme. This work makes it clear that the process of classification can introduce distortion and that this ran be avoided by using a directed graph representation. In this project the distinction is made between the filing process, the classification process, and the term allocation process. It is useful to think of the first stage of the classification process as one of "relationship indication", in which the relationships of a given theoretical entity with other entities are inserted. This results in a "directed graph network of entities which can be searched by computer, particularly to detect clusters with certain properties. This stage corresponds to the determination of similarity or dissimilarity between entities. In a second stage, the above network can be distorted so that its elements can be fitted into a chosen set of classes with a certain relationship to one another. This is classification" as opposed to the previous phase which inserts relationships irrespective of any class boundaries. It is convenient to call this activity "modeling". Clearly the modeling activity is a valuable preliminary to "classification". It is particularly valuable in that once completed, different systems of classification can be compared using the entities inter-related by the model, i.e. different degrees of distortion can be imposed upon the network of entities according to the immediate needs of the user. It may be useful to think of modeling in this context as a long-term multi-person, whereas a given classification can be selected from the modeled entities in terms of short-term, need-oriented considerations which permit certain relationships in the network to be considered as "irrelevant" -- permitting the isolation of simple, possibly hierarchic, classification schemes. In some cases, it may however be preferred not to distinguish modeling from classification and to blur the two operations into one another. 2. Filing and Classification In the case of document indexing application, no distinction is made between filing and classification. Because of this, the administrative problem of filing and the qualified expert problem of classification combine to create severe problems. The UNISIST Study (5) noted that little progress can yet be reported in the way of indexing-at-source and that a serious limiting factor to any form of cooperative indexing is the range of acceptability of the proposed indexes. Even the all-embracing and widely used U.D.C. has adversaries. The Study also noted that it is unlikely that the concept of a universal scheme will ever make any practical sense in the realm of deep content analysis (p.46) The reasons are the observed differences in the semantic basis of indexing languages which are the consequence of well-founded differences in outlook and interests on the part of a highly-diversified community of users. All that can be looked for, according to the UNISIST Study, "is the existence of semantic relations between the different lexical sets (be they called classifications, lists of disciplines, thesauri, automatic dictionaries for converting natural language into information language, etc). The study of these relations is the subject of ongoing research on the "compatibility" of indexing vocabularies .... the subject is now receiving much attention as an essential part of projects aimed at establishing world-wide interconnections between information systems." (p.46) It would appear from this, that the distinction between the impracticalities of classification and the practicalities of "relationship identification" (i.e. modeling) is becoming established. But the filing or administrative aspect of "entity capture" is now blurred into the modeling phase. There is as yet no suggestion that work on "computability" would be considerably facilitated if similar filing techniques were used prior to the activity at the modeling level at which the "well-founded" theoretical differences arise. Standardization is possible, but at a lower level consistent with user requirements. Until this is realized the relationship between lexical sets cannot be handled systematically by computer methods. 3. Advantages of Numerical Filing System The three major advantages of a sequential, non-significant numbering system for entities are:
Entities, relationships and models1. Types of Entity Included There is a very varied terminology currently in use to characterize theoretical products. Gunnar Sjöblom notes the use of conceptual (analytical, theoretical) frameworks, analytical schemes, paradigms, orientations, frameworks for inquiry, theory-sketches, pre-theories, etc.(7) The same is true for the components of the scientific process: problems, observations, empirical generalizations, models, derived propositions, hypotheses, theories, etc. It is unlikely that any immediate agreement could be achieved on a standard terminology, even if this was in fact beneficial. Each of the conceptual constructs represented by the above terms may be treated as an "entity" which could be incorporated into a computer fable. Once incorporated, efforts could be made to attach an appropriate distinguishing code to them within the framework of a given model. It is highly probable, for example, that under different models the same entity may be coded differently, or alternatively that distinctions important within one model will be insignificant in another (e.g. theory and model; hypothesis and proposition). As a summary, the above entities are numbered below to facilitate discussion on possible groups of entities: A. Concepts B. Meta-concepts
C. General
D. Assumptions
E. Methods F. Problems
G. Hierarchies
H. Operationalization
I. Data
J. Social
There is some advantage in a two-level coding here, because it might be possible to arrive more easily at agreement on the more general level coding, even if there are differences between models on the coding within that 1evel. There is of course the possibility that within a particular model the grouping would be done differently, in which case the coding scheme would be peculiar to that model. 2. Types of Relationship Included It is not the intention of this project to set up a single rigid classification of permissible relationships between entities. Just as no effort was made to limit the types of entities that could be handled (see above), it should not be necessary to make the futile attempt to resolve the intellectual problem of how many types of relationship are significant. That the attempt would be futile on the part of any one group is shown by Eric de Grolier's excellent chapters on this expression of relationships in generalized and specialized coding systems, in natural languages, and in experimental languages (8). He concludes, in his UNESCO/FID supported study, that it proved impossible to produce a systematization that was "sufficiently satisfactory to warrant even preliminary publication". This conclusion should not however lead to a decision to adopt some hypothetical "best existing scheme" or to the formulation of a news scheme. It should be recognized that the project should be capable of handling as many different schemes as possible. In fact the evolution of knowledge is partly represented by attempts to produce new schemes of relationship and categorization. Without recommending any particular scheme, it is useful to attempt to list out some of the relationships to give an idea of the variety that his been envisaged. De Grolier suggested a clarification of the sign ":" in the UDC (rejected by the FID Central Committee on Classification for the UDC) which covered the following relationships: 1.1 Appurtenance (belonging)
1.2 Process
1.3 Dependence
1.4 Orientation
1.5 Comparison 51 Resemblance, likeness, similarity
Other typologies of relationships have been formulated by Gardin, Farradane, Perry and Kent, Juilland. Each uses very different and, at least superficially, unrelated categories. The different models envisaged in the next section encompass compositional, behavioural, didactic, historical, cybernetic and problem-oriented relationships. 3. Types of Model It is important to keep in mind the many possible uses of the proposed computer-based filing system. Concentration on one set of uses may not necessarily keep the system alive either in terms of funding or value to current research activity. Multiple demands on it would ensure multiplicity of fund sources and many bodies willing to feed in entities and assist in different aspects of the coding. The following types of model are an illustration of the possible lines of development. The list does not pretend to be exclusive so that other kinds of model could be included. An attempt has been made to group, the models into types which in some cases might usefully be treated on the same occasion by the responsible modeling group. It is important to note that the models are not only simple hierarchies but can also be networks of relationships in cases where categories overlap or one entity can be a component of several other entities. Group 1: Current Structures This is a poor title but refers to all the current and new structures and relationships as made up of: 1.1 Compositional Models: These models would be primarily concerned with the manner in which entities are nested within one another to form hierarchies. Six types of relationship are possible here in three sets of two.
An interesting map of relationships between conceptual entities is given in Figure 1. This shows the interlocking and meeting of concepts associated with measurement of simple physical phenomena. 1.2 Behavioural Models: At the same time that the modeling activity is undertaken on the compositional relationship in 1a, it should be useful to consider some non-compositional relationships to other entities. In other words, the effects of the presence of one conceptual entity on another in the "ecosystem of ideas"(9). By this is meant concepts which are indirectly undermined or strengthened by the validity of this concept, organizations whose monopoly is weakened by the presence of this organization. Group 2: Contextual Structures Again this is a poor title but refers to the historical and comprehensional relationships which constitute a context for the Group 1 current situation, and would be used in learning about the Group 1 situation.2.1 Educational Models: These models would be produced by those modeling groups primarily concerned with education and raking more sophisticated concepts comprehensible. 2.2 Historical Models: These models would be produced by those modeling groups interested either in historical research on the history of ideas or in providing an historical framework to assist education. It is probable that the educational and historical models should be considered together, which is why they have been grouped. Group 3: Real World Systemic Relationships The previous groups of models deal with the relationship between conceptual entities in anthropocentric terms or within the logic of particular disciplines. It is also useful to consider the systemic effects of real world entities on one another. This produces another pattern of relationships between the entities registered. The best example of this distinction is the inter-disciplinary nature of environmental problems, when for example, it is the real world interaction of chemicals in food chains which cause egg shells to become thin -- leading to high chick mortality rate of some bird species. For a social example, the relationship shown between the entities, represented by boxes in figure 2, give a schematic representation of the factors binding a Canadian Indian to a pattern of problems. Group 4: Term-oriented Models In some cases where classification is rudimentary or non-existent, the emphasis is placed immediately on the terms. This is the case when:
There is no reason why each such set of terms should not be treated as a model as in the other groups. Where appropriate, the classification code position would be omitted and only the term positions used. Group 5: Administrative Models The assumption made in discussing the earlier groups of models was that the model was in some way a definitive structure on which new work would build. It is however possible to use the model building code to facilitate the administrative work on the definitive model. Group 6: Mission-oriented Models An assumption made in earlier groups is that the modeling bodies would all be discipline-oriented. There is however no reason why mission-oriented models should not be used where appropriate (e.g. in connection with development, environmental problems, etc.) Group 7: Interdisciplinary Models Clearly it is most important to avoid a "babble of models". A second level operation of model reconciliation to form a set of interdisciplinary or inter-model models could therefore be instituted when required. These could either (i) be constructed (automatically by computer) from all the entities common to the models from which it is desired to produce an inter-disciplinary model, or (ii) be constructed by selection based on judgement of the best from each. Group 8: Future-oriented Models A final assumption made in dealing with the earlier groups was that only the current or historical situations would be modeled. There is however no reason why speculative models should not be produced showing the relationships between entities at different points in the future. The modeling activity might then in some ways represent the Delphi method of forecasting. Group 9: Personal Models Perhaps a long term ideal is for a person to be able to "look at" (or interfere) with the basic list of entities in terms of his own model which is his personal "thought file". Each new idea he gets could be usefully reflected in the structure of this file. Group 10: Sub-models In some cases a particular sub-branch of knowledge may be fragmented by reinterpretation, reconceptualization and redefinition of the same entities. It is then appropriate to use a "sub-modeling" strategy. In other words, instead of requiring "dissident" groups to conform or to divert their energies into a parallel model with differences in a minor area, a sub-model could be used to redefine that area in the dissident group's terms. The sub-model would therefore offer an alternative interpretation. Group 11: Languages as sub-models It may be convenient, for some purposes, to consider the relationships between theoretical formulations used in a particular language as a sub-model. The differences between the concepts encountered in Indo-European languages are relatively minor, so that term equivalents pose no great problems, but should it be necessary to enrich the system by incorporating theoretical formulations from other language groups problems could arise. Data to be included on each entityA. Concept Filing Phase (Identification or Registration) 1. Entity Sequence Number (10): Each new conceptual entity, of whatever type (see earlier section), receives a unique number which is the next available in a sequential list. The number therefore contains no significant digits or codes and has no meaning for classification purposes. (It may be an advantage to use the check digit technique.) For practical purposes, it may be convenient to pre-allocate blocks of numbers to different filing centres whenever required. This avoids problems of duplication and speeds up administration. Where duplication does occur, this is eliminated at the modeling stage. One advantage of this sequence number as a concept identifier is that it is not necessary to file a definition or conventional term at the same time. This is convenient if a new theoretical formulation has been tentatively conceived with known relationships to other concepts but with no clear definition or label yet. It avoids the need to coin doubtful neologisms in order to register the concept. in some cases it may even be an advantage to leave the term defined by its context of relationships, and not to bother attempting to find a suitable term. In which case the sequence number would be used as the only identifier until a suitable terminology for concepts in that domain can be elaborated more systematically. 2. Model Description
3. Cross-reference: Cross-references are used during the modeling phase so that this zone is "free". It is, however, used in this phase to identify the sequence number of :
4. Source Code: There are several possible ways of handling information about the source of information on the entity.
5. Model Descriptor: This is not used during this phase. 6. Relationship Descriptor: This is not used during this phase. 7. Date Codes
8. Status: For administrative purposes it is convenient to have a zone in which codes may be used to indicate that the entity is "under consideration", "of doubtful value", "no longer used", etc. 9. Text: The words or text used for:
would be inserted into this zone. This zone could also be used for any special comments which might be usefully added. B. Concept Coding Phase (Modeling or Classification) Many of the zones discussed above are used in this phase but for a different purpose or in order to establish computer records distinct from those created during the earlier phase or by other modeling groups. 1. Entity Sequence Number: This is repeated for each new relationship established within a model and is of course the same as that used in filing the identity in A.1 2. Model Descriptor
3. Cross-references: This zone supplies the main means by which the relationship of this entity to other entities is indicated for the particular model indicated in 2.1. The sequence number of the other entity is indicated here. In effect, every such "relationship" gives rise to a new computer record (see figure 3). The type of relationship is either implicit, because of the model used, or is described in 6 and 7. 4. Source Code: Depending on the method chosen (see A.4.1, A.4.2, A.4.3, A.4.4, and A.4.5), the source coding would probably either be allocated during the concept filing phase with nothing in this phase, or in this phase with nothing in the previous phase. In the most sophisticated system, it might however be desirable to give:
Source coding during the modeling phase might be particularly helpful in the administrative work of elaborating a model, since it permits members of a modeling group, working independently and in isolation, to "vote" on the insertion or deletion of particular relationships (see A.4.5). Such a postal vote system would be particularly helpful in clarifying with precision just what was under discussion at any point in time. 5. Model Descriptor: This zone is used to indicate which model is to be considered at the entity cross-references in 3. In a simplified system this zone would not be required because the assumption would be made that each model was totally isolated from other models. In a more sophisticated system however, there is need for a means of expressing relationships between parts of models. For example, it may be that in a certain domain two models are identical or that one forms a subset of the other. In such a case there is little need to duplicate all the relationships in the second model, provided cross-reference between the models is possible.
6. Relationship Descriptor: This zone is used to describe the relationship constituted by the link between this entity and that cross-referenced in 3. Two basic types of relationship descriptors may be distinguished.
7. Date codes 7.1 Date first used: This may be used to indicate the date each relationship between entities was first noted, or alternatively the computer can automatically insert the date on which the relationship vitas first filed. 7.2 Date last used: This date may be used when the relationship is finally rejected as invalid or unacceptable. 7.3 Retention period: This zone may be used by members of a modeling group to communicate with one another. A member may submit "trial balloon" relationships, with' a very short (one-cycle) retention period so that others can "see how it looks. Once agreed, the retention period can be set so that relationships periodically come up for review. 8. Status code: For modeling group administration purposes it is convenient to have a zone which may be used to indicate that the relationship is "under consideration", "a tentative proposal", "a firm proposal", "agreed by the group", "required priority attention", etc. 9. Text: Normally a relationship record should require no text. There is however no reason clay this zone should not be Cased for any text comments on relationships which may seem significant to the modeling group. C. Term Allocation Phase 1. Entity sequence number: Required as before. 2. Model descriptor A:
3. Cross-reference: Normally this would be "0". It may however be necessary to indicate other entities using the same term (but obviously with a different meaning). 4. Source code: There may be some cases where it is important to indicate the document in which the justification for the unique authoritative term is urged. 5. Model descriptor B: May be required if the cross-reference to a use of the same term in a different model is needed. 6. Relationship descriptor: Not required. 7. Date codes
8. Status code: May be used as in B.8 9. Text: The words used in the authoritative term are inserted into this zone. Alternatively, the equivalent decimal coding could be inserted, if desired. Limitation of scope and sources of concepts1. Scope: The design of the system is sufficiently general that it could be used to order theoretical formulations in any area of knowledge. Such broad coverage would clearly be impracticable, and probably even undesirable, in the foreseeable future. It is useful to re-emphasize that the proposal is not concerned with the areas covered by social science documentation as there are many such documentation projects. The UNISIST report mentions the parallel programs proposed by such bodies as the International Council of Social Sciences and the International Committee for Social Sciences Documentation. There are numerous equivalent projects at the national level. The object here is to concentrate on theoretical formulations which may or may not be mentioned in a given collection of documents. The priorities proposed would be based on three dimensions:
This does not of course preclude any modeling group from concentrating solely on the formulations of its own school of thought. The main concern however should be to ensure that the system reflects the general framework of theoretical formulations. Highly specialized formulations should not clutter up the modeling activity. Little effort should be made to include minutiae about particular social entities which have not been reflected in more general formulations -- unless such minutiae represent unique evidence of the need for new formulations. The system should be compact and easy to use rather than large and unwieldy as are most documentation systems. 2. Sources: Guidance in limiting scope can be obtained by concentrating in the light of the above priorities on concepts mentioned in such publications as 2.1 David L. Sills (Ed.), International Encyclopedia of the Social Sciences, Macmillan, 1968. 2.2 Julius Gould and W.L. Kolb, Ed.), A Dictionary of the Social Sciences (compiled under the auspices of Unesco) New York, Free Press, 1964. 2.3 UNESCO. Main Trends of Research in the Social and Human Sciences. Paris, Unesco, (Part one: social sciences, 1970, 819 p.; Part two: human sciences, 1972). Also in French edition. 2.4 International Committee for Social Sciences Documentation. International bibliography of the social sciences. London, Tavistock, 4 annual volumes (sociology, political science, economics, social and cultural anthropology). 2.5 Key textbooks in each discipline. 2.6 Specialized multi-lingual dictionaries and glossaries, such as:
2.7 Institute for Scientific Information. Social Sciences Citation Index. Philadelphia, (included in the SSCI are three separate but related indexes of different periodicity covering the literature of the specified calendar year. Price in 1973: $1,250(sic) ). 2.8 International social science organisations. A preliminary count indicates that possibly some 30 such bodies could contribute in some way to the project. Concept notation in documentsIt has been stressed that this project does not require a complex notation system since each concept is represented by a single sequence number, plus an indication of the model number in question, if required. Nevertheless, since one object of this approach is to permit scholars to refer with precision to a particular concept in their papers, a standard method of indicating such a concept in print is required. A similar problem arises in the natural sciences in distinguishing between different isotopes of the same atom (i.e. cases where slightly different versions of the same atom exist due to differences in atomic weight), where the same symbol does not distinguish between isotopes. The solution adopted is to indicate the atomic weight as a superscript to the standard symbol. In the case of concepts, represented in print by the same word, one solution would be to use the sequence number of the concept as superscript to the word: e.g. democracy+251 democracy+942 To avoid confusion with bibliographical references, the number could perhaps be preceded by an asterisk. There is a strong temptation to adopt a technique for uniquely identifying concepts similar to that of the International Standard Book Numbering (ISBN) system now used ton the reverse of all recent book title pages) to give a unique code to each book. This number consists of 10 digits made up of the following parts:
The total length is 10 digits, but the three identifiers only total 9 digits. In order to avoid wastage of numbers or lack of sufficient numbers, publishers with a large book output (of which there are few) have a two or three digit identifier so that the title identifiers can use six or five digits. A small publisher (of which there are many) has a five or six digit identifier so that the title identifier can case two or three digits. The publisher identifier is therefore selected on the basis of his output using from two try six digits as required. Hyphen separators are used. The temptation to use this system should however be resisted. While the significance attached to the digits is only "administrative" and has no "theoretical" implications, problems of overflowing the allocated blocks are bound to occur. The system will "bulge" in unpredictable areas as the U.D.C. has done. It is also questionable whether so much significance should be placed on the source which, ones the concept has been incorporated, will quickly become irrelevant within the network of other related concert. from other sources. |