1992
Network Mapping: Software Possibilities
Summary of the problem
-- / --
Annex 3 of Visualization
of International Relationship Networks
(1992)
Summary
The problem is most easily described by analogy. Consider a relational
database with records consisting of subway stations and indications of
which station was directly connected to which other stations (and possibly
on what "line").
-
(a) The core problem is how to obtain/adapt/develop software which
would generate one or more maps of the subway station network. The principal
constraint is that the map should be comprehensible. It is neither required
nor desirable that the map should be constrained by some equivalent to
"topographic" constraints (namely the position of the stations
should not be determined by some form of geographic coordinates). Rather
the requirement is that the positions should be determined topologically
and mapped, at least for immediate purposes, onto a two-dimensional surface.
-
(b) There are additional problems which can be treated at lower levels
of priority, if at all. They include:
-
A second problem is that the database in fact contains over 10,000
nodes and ways must be found to segment the network (possibly filtering
out lower levels of detail) so that maps for individual segments can be
interrelated. Such maps, in hardcopy form, will be bound together in a
book to form an "atlas".
-
A third problem is that it is desirable that there should be some means
of editorial interaction with the map to improve its visual quality.
-
A fourth problem is that it is desirable that it be possible to update
the data base by introducing changes interactively to the map.
-
A fifth problem is to open the way to using the map as a menu through
which the database can be queried for additional information on the nodes.
Software "modules"
(a) Relational database
The data is currently held and maintained in a Revelation database (version
G2B) running on a Novell network. The database has been specially developed
as a text database with facilities to manage networks of relationships
between the records. It is desirable that when the data is displayed in
map form, interactive changes to the map should be carried back as updates
to the database. But since the prime requirement is for publishable hardcopy
maps, this requirement may be sacrificed in the short term.
It is appropriate to note that Version G2B can now be upgraded to Advanced
Revelation and that some new software has been specifically developed in
relation to the upgraded version only.
(b) Map design
Several approaches may be taken to the problem of map design:
(i)
Network analysis: This uses specialized extensions of sociometrics
to take data of the type described above and to position the elements in
relation to each other on the basis of various measures of distance, with
those most connected tending to be placed at the centre of a network and
those least connected at the periphery. The advantage of this approach
is that it endeavours to mirror the network on the basis of its internal
characteristics. A number of software packages exist to perform the necessary
computations. Various ways of describing a network and identifying key
components result from such analysis.
The disadvantage of such software is that it has been developed for relatively
small networks only (100 to 300 nodes). Few of the packages are designed
to permit mapping of the resultant network. Data is output in matrix form
only or as indices in relation to key elements. More seriously, such networks
when mapped result in maps which, although they reflect the data, are not
designed to enhance the comprehensibility of the data (other than in a
purely scientific sense). Such computations can consume considerable amounts
of computer time, even on fast machines.
This approach is being explored using test data from the UIA Revelation
database consisting of some 5,000 nodes. The work is currently being done
on a Mac II using software developed at the University of Dartmouth by
JoelLevine of the Department of Mathematical Social Sciences. This software
has not been adapted to run under MS-DOS.
(ii)
"Crude mapping" A simplistic approach could be taken.
This would involve positioning the nodes on a grid determined by the subjects
with which they are associated. Such a subject grid (with positions determined
by a 4 character identifier) is in use to categorize the UIA data into
some 3,000 categories. Relationships would then be plotted between the
nodes.
In this case comprehensibility is achieved through the link to the matrix
and not through determining the shape of the network. Use of a grid could
severely undermine the memorability of the network. It would however be
relatively easy to develop and quick to run. A key question would be what
kind of interaction it would be possible to have with such a map and whether
it would be possible to shift from a detailed focus on a specialized cell
of the grid to a wider focus and back (a zoom facility).
(iii)
Topological manipulation In this approach, the network of
relationships between nodes would be simplified using topological constraints.
For example a string of interlinked nodes would be represented by a straight
line. The position of the nodes on the line might be equidistant or determined
by some logarithmic function based on the distance from the centre of the
line. The aim would be to introduce symmetry elements into the data so
that it acquires a distinct and memorable pattern or shape. Some of the
algorithms required presumably correspond to those of pattern recognition
problems.
(c) Plotting
Once coordinates have been determined, software is required to plot the
network, whether onto the screen or onto a graph plotter. Many packages
exist for this purpose. A distinction should however be made here between
adequate quality plots (for working purposes) and high-quality plots for
publication in book form. The latter question is discussed later.
The problem in plotting is to be able to introduce distinguishing elements
into the plot. These may include variations in line thickness (corresponding
to some measure of importance or proximity), variations in node size (corresponding
to the number of connections to the node) and the introduction of identifying
labels for the nodes.
A key requirement is that the plot be made from the data as processed by
one of the above techniques, rather than from data which is manually input.
A distinction must also be made between a curve fitting approach and one
which passes through the nodes as is required here. A distinction also
needs to be made between plotting a graph (from left to right) and plotting
a network in which there is no privileged direction. The latter form is
more characteristic of CAD programs (see below).
(d) Drawing
It is desirable to move towards an interactive approach to the data. In
other words, once a plot is made for a segment of the overall network,
editors should be able to modify the network. Such modifications might
take one of two forms. The first would consist of simply moving portions
of the plot to make it more comprehensible, making room for labels and
improving the aestheties. The second might also involve the capacity to
add or delete features from the network. It would of course be highly desirable
that the latter changes should be carried back into changes to the relational
database. This can raise severe problems of compatibility between the relational
database and the drawing/plotting software, whether in terms of software
or of intermediate files. Such features are available in many CAD programs.
It is however important to recognize that the CAD software is here used
to "design" logical or topological constructs rather than buildings
or mechanical parts. This is not a limitation but it may permit use of
simpler (and cheaper) CAD software.
It is appropriate to note that the variant of CAD software used for interactive
printed circuit board design (PCB) has many features of value to the present
application, especially the "auto-router" feature which positions
connections on the circuit board in the most economic manner (avoiding
cross-overs, etc). Unfortunately the positioning criteria do not make for
maximum comprehensibility.
(e) Interface software
In the case of Advanced Revelation there exists a software product CAD/Base
which offers "complete integration of CAD drawings with a database
environment", via industry standard DXF files. The drawing is viewed
as a Revelation file and the drawing elements as Revelation records and
fields. The drawing exists as a master file in both the Revelation and
CAD environments. Changes in one environment are reflected in the other
automatically without any intermediate file conversion required.
Clearly this offers interesting opportunities for using the network map
as a menu through which users can select individual nodes on which they
can immediately access additional text data.
(f) High-quality graphic output
One objective is the production of maps to be printed in book form. To
achieve this one approach might be to produce output in a form which can
be handled by PC-TeX to create files for output on a high quality laser
printer.
(h) Integration of features
It is possible that CAD/Base offers an appropriate means of integrating
the different features discussed above (except the last). It is also possible
that such a product, which is relatively expensive, can be considered as
"overkill", and that a more compact approach would be more suitable
and easier to make available to others. If the emphasis is on the simpler
strategy of generating hardcopy, this would certainly be the case. To the
extent that interaction with the data is desirable, then more features
would be required, even though only a selection of standard CAD features
would be necessary.
For the user, there is obviously great merit in ease of use as an adjunct
to normal text editing procedures. Ideally such a package would bear some
resemblance to the more sophisticated forms of "outliner", such
as MORE and INSPIRATION running on Apple machines. In these an essentially
hierarchical outline of topics can be opened up into standard text processing
or converted into bullet charts. What is required is an equivalent which
is tied into a relational database environment. The different approaches
to network "map design" noted above might then be options in
the way the data was manipulated for presentation, as is the case in standard
business graphics (bar charts, pie charts, etc).