30th October 2006 | Draft
Generating a Million Questions from UIA Databases
Problems, Strategies, Values
- / -
Also published in modified form in Statistics, Visualizations
and Patterns (Vol 5
of the Yearbook of International
Organizations, Munich, K G Saur Verlag, 6th
edition, 2006/2007, as sections 10.1.1 and 10.1.2)
Background
The experiment described below follows from an initial interest of the German
Research Centre for Artificial Intelligence (DFKI), in support of the
questions project of the international nonprofit organization Dropping
Knowledge – as clarified during a workshop on the online
databases of the Union of International
Associations (Saarbrucken, 8 December 2005). Dropping Knowledge subsequently
appropirated this information as the basis for establishing an online web facility
to enable people worldwide to ask questions and to be exposed to answers
-- thereby creating a "Living
Library". The categorization of the questions was undertaken using
the ontology developed by the UIA (cf Enabling
a Living Library, 2006)
The concern here, in contrast, is whether it was possible to generate a Questions
database from three long-established UIA databases: World
Problems-Issues, Global Strategies-Solutions,
and Human Values. These databases
are part of the online Encyclopedia
of World Problems and Human Potential, originally initiated in collaboration
with Mankind 2000, whose development was most recently funded by the European
Commission. The databases are integrated with others on international organizations,
international meetings, biographies and bibliographies (cf Yearbook
of International Organizations, International
Congress Calendar).
The thousands of problems, strategies and values identified from the documented
preoccupations of the network of international organizations (governmental
and nongovernmental) provide a relatively objective focus for the generation
of questions associated with those preoccupations -- or implicit in them.
Clearly a particular interest in this experiment is to determine in what ways
the result of generating questions could be meaningful and significant. The
work builds on the possibilities of the use of such databases for simulations
(cf Simulating
a Global Brain: using networks of international organizations, world problems,
strategies, and values, 2001).
WH-questions
There is an extensive literature on what are termed “WH-questions”. “WH-questions” refer to questions of the type: How? Why? Where? What. Which? When? Who? Further comments on studies in relation to such questions are noted below.
The Questions database was first generated experimentally in December 2005, and then more comprehensively in October 2006. In each case by applying a template of the WH-questions to the titles of Problems, Strategies and Values.. This can be done by embedding the "seed title" (XXXX) in a suitable template phrase. For example::
- How is XXXX caused?
- Who is responsible for YYYY?
- Where does XXX occur?
- When does XXXX occur?
- What is XXXX?
The range of templates is illustrated by the following table
| Templates used experimentally to generate questions |
| Source database |
Generated
query (** = not finally used) |
| . |
WH-query |
Phrase-1 |
Seed |
Phrase2 |
World
Problems-Issues
(13 templates
applied)
|
How |
much |
XXXX |
is there? |
| How |
does |
XXXX |
happen? (**) |
| How |
is |
XXXX |
caused? (**) |
| Why |
does |
XXXX |
happen? |
| Why |
give priority to |
XXXX |
? |
| Why |
does God allow |
XXXX |
to happen? (**) |
| Why |
be concerned by |
XXXX |
? (**) |
| Where |
does |
XXXX |
occur? |
| What |
is |
XXXX |
? |
| What |
causes |
XXXX |
? |
| What |
results in |
XXXX |
? (**) |
| Who |
causes |
XXXX |
? |
| Who |
is responsible for |
XXXX |
? |
| Who |
is concerned about |
XXXX |
? |
| When |
does |
XXXX |
occur? |
| When |
will |
XXXX |
occur? |
| When |
did |
XXXX |
arise? |
| Which |
kind of |
XXXX |
? |
Global
Strategies-Solutions
(14 templates
applied) |
How |
can |
XXXX |
be enabled? |
| Why |
is |
XXXX |
unsuccessful? |
| Why |
give priority to |
XXXX |
? |
| Where |
is |
XXXX |
undertaken? |
| Where |
is |
XXXX |
successful? |
| What |
is required for |
XXXX |
? |
| What |
causes |
XXXX |
to fail? |
| Who |
undertakes |
XXXX |
? |
| Who |
is responsible for |
XXXX |
? |
| Who |
is concerned about |
XXXX |
? |
| When |
is |
XXXX |
undertaken? |
| When |
will be |
XXXX |
undertaken? |
| When |
was |
XXXX |
undertaken? |
| Which |
kind of |
XXXX |
? |
Human Values
(constructive
or destructive --
9 templates
applied
) |
How |
is |
XXXX |
elicited? |
| Why |
is |
XXXX |
valued? |
| Why |
give priority to |
XXXX |
? |
| Where |
is |
XXXX |
found? |
| What |
is |
XXXX |
? |
| Who |
exemplifies |
XXXX |
? |
| Who |
values |
XXXX |
? |
| When |
is |
XXXX |
evident? |
| Which |
kind of |
XXXX |
? |
Human Values
(polarities --
10 templates
applied
) |
How |
are |
XXXX |
related? |
| How |
can |
XXXX |
be reconciled? |
| How |
can |
XXXX |
be transcended? |
| Why |
is the |
XXXX |
relation so challenging? |
| Where |
are |
XXXX |
reconciled? |
| What |
transcends |
XXXX |
? |
| Who |
embodies |
XXXX |
? |
| Who |
exemplifies the |
XXXX |
ambiguity? |
| When |
are |
XXXX |
transcended? |
| Which |
kind of |
XXXX |
relationship? |
As is clear from the table above, different templates were used both according to the source database and according to the WH-Question. Since many Problems and Strategies have a number of alternative titles (notably employing synonyms), these too have been used as seeds for the generation of alternative titles for a question -- effectively constituting alternative formulations of the same question (but clustered together in the Question entry). Although they may be accessed through their keywords, they are not treated as distinctly profiled questions.
Preliminary results
The very preliminary results in generating these questions in December 2005 are indicated in the following table.
|
| Preliminary results (December 2005) |
| . |
Seed
entities |
WH-templates
used |
Questions
generated |
|
Total |
Selected
|
|
Main |
WH-Variants |
Problems |
59205 |
12995 |
13 |
168935 |
239252 |
Strategies |
42032 |
12848 |
14 |
179872 |
167426 |
Values |
3257 |
3209 |
9 /10 |
29111 |
16470 |
Totals |
104494 |
29052 |
. |
377918 |
423148 |
. |
. |
. |
. |
. |
801066 |
| Indication of distribution of seed entities by type |
. |
Problems-Issues |
Strategies-Solutions |
. |
Profiles |
Links |
Profiles |
Links |
. |
1996 |
2000 |
% |
1996 |
2000 |
% |
1996 |
2000 |
% |
1996 |
2000 |
% |
A |
0 |
196 |
n.a. |
0 |
3,507 |
n.a. |
0 |
1,518 |
n.a. |
0 |
16,767 |
n.a. |
B |
170 |
187 |
10% |
5,300 |
7,090 |
34% |
158 |
154 |
-3% |
3,697 |
4,253 |
15% |
C |
575 |
722 |
26% |
13,816 |
19,347 |
40% |
1,100 |
1,089 |
-1% |
17,096 |
25,206 |
47% |
D |
2,162 |
2,740 |
27% |
30,613 |
52,451 |
71% |
3,315 |
3,452 |
4% |
19,374 |
43,329 |
124% |
E |
3,857 |
5,378 |
39% |
29,626 |
52,587 |
78% |
3,008 |
5,298 |
76% |
11,092 |
50,677 |
357% |
F |
3,072 |
3,917 |
28% |
38,625 |
61,604 |
59% |
1,382 |
1,972 |
43% |
7,015 |
19,580 |
179% |
G |
2,153 |
30,279 |
1306% |
5,979 |
47,112 |
688% |
7,685 |
13,107 |
71% |
3,604 |
69,059 |
1,816% |
Other |
214 |
12,716 |
5,842% |
905 |
26,255 |
2,801% |
12,850 |
6,105 |
-52% |
61,129 |
34,070 |
-44% |
Total |
12,203 |
56,135 |
360% |
124,864 |
269,953 |
116% |
29,498 |
32,695 |
11% |
123,007 |
262,941 |
114% |
|
| Indication of seed entity relationships |
.. |
Hierarchical links |
Functional links |
. |
.. |
Broader |
Narrower |
Related |
Aggravating |
Aggravated by |
Reducing
|
Reduced by |
. |
Problems |
26403 |
35500 |
14264 |
31024 |
31105 |
1507 |
1529 |
|
Strategies |
27134 |
32541 |
3010 |
3302 |
2902 |
17826 |
16911 |
. |
Values |
. |
11392 |
. |
. |
. |
. |
. |
. |
Totals |
. |
. |
. |
. |
. |
. |
. |
|
The subsequent generation of the Questions in October 2006 gave the following results:
| Final results of question generation (October 2005) |
| . |
Problems |
Strategies |
Values |
Value-Polarities |
Totals |
| . |
Main |
WH-
Variants |
Main |
WH-
Variants |
Main |
WH-
Variants |
Main |
WH-Variants |
Main |
WH-Variants |
All |
Seed
entities |
45892 |
31055 |
2978 |
229 |
80154 |
WH-templates |
7 |
6 |
7 |
7 |
7 |
2 |
7 |
3 |
28 |
18 |
46 |
Main |
321244 |
- |
217385 |
- |
20846 |
- |
1603 |
- |
561078 |
- |
561078 |
(Alternative titles) |
133917 |
114786 |
107422 |
107422 |
9807 |
2802 |
- |
- |
- |
- |
- |
WH-Variants |
- |
275352 |
- |
217385 |
- |
5956 |
- |
687 |
- |
499380 |
499380 |
Total
questions |
321244 |
275352 |
217385 |
217385 |
20846 |
5956 |
1603 |
687 |
561078 |
499380 |
1060458 |
Broader |
433244 |
371352 |
370174 |
370174 |
79765 |
22790 |
0 |
687 |
883183 |
765003 |
1648186 |
Narrower |
346577 |
297066 |
284396 |
284396 |
0 |
0 |
79744 |
34176 |
710717 |
615638 |
1326355 |
Related |
113302 |
97116 |
33768 |
33768 |
0 |
0 |
0 |
34863 |
147070 |
165747 |
312817 |
Total
hierarchical |
893123 |
765534 |
688338 |
688338 |
79765 |
22790 |
79744 |
69726 |
1740970 |
926888 |
2667858 |
Aggravates |
235347 |
201726 |
42196 |
42196 |
77 |
22 |
0 |
0 |
277620 |
243944 |
521564 |
Aggravated by |
233989 |
200562 |
42637 |
42637 |
0 |
0 |
0 |
0 |
276626 |
243199 |
519825 |
Reduces |
11298 |
9684 |
174125 |
174125 |
0 |
0 |
0 |
0 |
185423 |
183809 |
369232 |
Reduced by |
11340 |
9720 |
175469 |
175469 |
0 |
0 |
0 |
0 |
186809 |
185189 |
371998 |
Total
functional |
491974 |
421692 |
434427 |
434427 |
77 |
22 |
0 |
0 |
926478 |
856141 |
1782619 |
Strategies |
163289 |
139962 |
0 |
0 |
336966 |
96276 |
182 |
78 |
500437 |
236316 |
736753 |
Problems |
0 |
0 |
163205 |
163205 |
256585 |
73310 |
20090 |
8610 |
439880 |
245125 |
685005 |
Values |
223293 |
191394 |
337134 |
337134 |
0 |
0 |
0 |
0 |
560427 |
528528 |
1088955 |
Total
cross-database |
386582 |
331356 |
500339 |
500339 |
593551 |
169586 |
20272 |
8688 |
1500744 |
1009969 |
2510713 |
Total
relationships |
1771679 |
1518582 |
1623104 |
1623104 |
673393 |
192398 |
100016 |
78414 |
4168192 |
3412498 |
7580690 |
Remarks
The above results are of course extremely preliminary. Some clarifications regarding the above table are appropriate:
- the application of a single WH-template of each of the 7 types (Who, Where,
etc) to each seed entity (Problem, Stategy, or Value) gives rise to the column
labelled "main".
- a distinction is in the table made between "values" (constructive and destructive)
and "value-polarities", corresponding to the exploration of the use of the
latter as a significant device for clustering values
- the application of any additional WH-templates (6 in the case of Problems) to each seed entity gives rise to the column labelles "WH-variants"
- the results for "main" and "WH-variants" may be the same (as in the case of Strategies) because the number of templates applicable is the same
- the "alternative titles" do not give rise to separate question
entries for statistical purposes
- the labelling of the functional relationships is in practice different in the case of Problems and Strategies although in each case the systemic notion of facilitating or constraining is the same
The results could be substantively affected by:
- Increasing or reducing the number of original entities selected for application of the WH-templates
- Increasing or reducing the number of WH-question templates themselves
- Culling questions after generation
Integration into UIA set of databases
The Questions database, with its 1,060,458 Questions, was integrated into
the UIA set of online databases by Tomáš J.
Fülöpp in October 2006. It is freely accessible
over the web. This integration has the advantage of using the common search
and visualization interfaces developed for the other databases (including World
Problems, Global Strategies, Human Values, International Organizations, Intermentaion
Meetings, etc). The format of a displayed question record is as follows. Infomation
on various types of relationships between questions clearly depends on the
presence of such information in the seed entry in the source database.
| Output / Displayed Record |
Source database: The questions have been generated from the titles of entries in three different databases: World Problems-Issues (P), Global Strategies-Solutions (S), or Human Values (V)
Seed title: Questions have been generated by taking each of the (possibly several alternative) titles of a Problem, a Strategy or a Value. The title associated with this entry is indicated here
Type code: In the source database (Problems, Strategies, Values), each entry is allocated a type code. Typically, in the case of Problems or Strategies, the lowest letters of the alphabet indicate the most generic entries where the higher leters in the alphabet indicate more specific entries. In the case of the Values database, entries of Type C are associated with Constructive Values, those of Type D with Destructive Values and those of Type P with Value Polarities
WH-Question type: The following classical types of generic "WH-questions" are used to generated the Questions in this database: 1=When? 2=Where? 3=Which? 4=How? 5=What? 6=Who? or 7=Why? The relevant one for this Question entry is indicated here.
Question variant: To distinguish between the (possibly several) alternative titles, each is given a single digit number. The first (1) is that which is presented as the principal title of the entry in the source data base. Questions of different types (What? Where? etc) are applied to each such title which maintains that single digit number.
WH-Question family: These are the titles of all the Questions generated from a single title of the Problem, Stategy or Value entry from which the Question derived. They therefore all share the same "Question variant" (1-9) but are all associated with a different WH-Question type (Who? Where? When? Which? How? Who? Why?)
---- Relationships ----
Broader questions: These are Questions that are more general, or more contextual, than that of the entry. They correspond to Questions that have been generated, as appropriate, from any broader Problem, Strategy or Value in the corresponding databases.
Narrower questions: These are Questions that are more specific than that of the entry. They correspond to Questions that have been generated, as appropriate, from any narrower Problem, Strategy or Value in the corresponding databases.
Related questions: These are Questions that are associated in some non-specific way with the Question of this entry. They correspond to related entities in the seed entry in the source database whether a Problem or a Strategy.
Aggravates (P) / Constrains (S): These are Questions that may impose constraints on those of the same type with which they are linked in this way. They correspond to Questions that have been generated, as appropriate, from any equivalent Problem or Strategy in the corresponding databases.
Aggravated by (P) / Constrained by (S): These are Questions that may be constrained in some way by those of the same type with which they are linked in this way. They correspond to Questions that have been generated, as appropriate, from any equivalent Problem or Strategy in the corresponding databases.
Reduces (P) / Facilitates(S): These are Questions that may reduce or facilitate those of the same type with which they are linked in this way. They correspond to Questions that have been generated, as appropriate, from any equivalent Problem or Strategy in the corresponding databases.
Reduced by (P) / Facilitated by (S): These are Questions that may be facilitate in some way by those of the same type with which they are linked in this way. They correspond to Questions that have been generated, as appropriate, from any equivalent Problem or Strategy in the corresponding databases.
Strategy questions: These are Questions generated from any Strategy entry associated with the Problem entry from which this Question was generated in the Problems source database. This field is not relevant in the case of Questions generated from the Strategies database.
Problem questions: These are Questions generated from any Strategies entry associated with the Strategy entry from which this Question was generated in the Strategies source database. This field is not relevant in the case of Questions generated from the Problems database.
Value questions: These are Questions generated from any Values entry associated with the Problem or Strategy entry from which this Question was generated. It is not relevant in the case of entries from the Values database. |
Comments
Fundamental to this exploration are the following issues:
- Problems as themselves constituting a form of "question" calling
for some form of "answer" -- although, as "questions",
they may themselves merit questioning, thus clarifying a challenging problem
by asking more searching questions
- Strategies as themselves constituting a form of "answer" -- although,
as "answers" that may themselves merit questioning -- by "putting
them to the question"
- Values motivating both acknowledgement of any "question" or
sensitivity to the need for an "answer" -- but whose underlying
concern may also be fruitfully questioned
The original datasets of Problems, Strategies and Values -- as developed since
1972 -- have large numbers of relationships between records within and between
those databases. These relationships have been the subject of extensive "hyperlink
editing", most recently by Nadia McLaren, enabling extensive analysis
(cf Anthony Judge and Nadia McLaren,
Feedback Loop
Analysis in the Encyclopedia Project, Extract from the final
report on Information Context for Biodiversity Conservation, 2000).
From the above, given that this pattern has been preserved, it can be seen
that these are based on various types of relationship:
- Hierarchical
- Systemic or functional
- Cross-database
The generated Questions database is therefore an overlay of the above network of links in terms of:
- The 7 WH-questions
- The variants (max. 3) of some of those WH-questions
The generated questions are linked back to the entity from
which they were generated, whether a problem, a strategy or a value -- thus providing
another point of access to these datasets. This integration of the other datbases
through specially framed questions may prove to be particularly valuable.
The question templates are necessarily different between databases and between
variants. The simplistic nature of the templates may not necessarily result
in question titles that are grammatically totally correct at this stage --
but sufficiently so for the purpose of this experiment.
Visualization of patterns of questions
The visualization possibilities for networks of interrelated Questions were
first envisaged as a contribution to the media project associated with the
launch of the Living Library by Dropping Knowledge (see Complementary
Knowledge Analysis / Mapping Process, April 2006), notably with respect
to the use of Netmap.
The visualization facilities implemented by Tomáš J.
Fülöpp for online exploration of the Questions database form
part of the set of options developed for exploration of the wider set of
databases: Problems, Strategies, Values, Organizations, etc. (Information
visualization and sonorification: displaying complexes of problems, strategies,
values and organizations). The facilities include:
- Circular metaphor: Representation of questions as nodes
on the circumference of a circle, with relationships expressed as lines linking
across the circle. Question titles are presented radially from each node,
outside the circle (and may be clickable to offer access to the question
entry). This representation takes several forms:
- PNG (Portable Network
Graphics), namely a bitmap image that is quickly generated and not
subject to any patent restrictions
- SVG (Scalable Vector
Graphics), namely an XML markup language enabling descripton of two-dimensional
vector graphics, both static and animated. This is an open standard created
by the World Wide Web Consortium. It has the considerable advantage of
being susceptible to being editing as a text file. As its name implies
this format has the advantage of allowing the image to be scaled to any
size without loss of resolution. As implemented it allows the image to
be used as an index to individual questions.
- SWF (Small Web Format
or Shockwave Flash), is a widely used proprietary format for displaying
animated vector graphics on the web.
- Hierarchical metaphor:
- Hyperbolic metaphor: Representation of the network
of relationships between questions (as nodes) on a a hypergraph,
namely projected onto a hypersphere using the open source HyperGraph application.
This is a java applet that offers a means to visualize hyperbolic geometry,
to handle graphs and to layout hyperbolic trees. It is especially useful
for exploring large volumes of data that have a degree of hierarchical
structure.
- FreeMind: This is a mind mapping application written
in java, typically providing extensive export capabilities. FreeMind has
numerous features, especially the ability to unfold and fold branches
of deep hierarchical structures, allowing for links between different
branches
- Spring map metaphor: Representation of the questions (as
nodes) linked by "springs" whose length adjusts to represent the
network of relationships in an optimal manner -- effectively in three dimensions.
This is done in a java applet devloped by Gerald de Jong. Any such spring
map is inherently dynamic in its constant search for a better equilibrium.
The position of its elements can be adjusted and fixed by the user for greatest
clarity.
- Scatter graph metaphor: Representation of questions and
links using a more common scatter
plot technique, whether in PNG, SVG or SWF formats (as described above)
All of these options allow the user to apply them to particular topic searches
and to adjust the complexity of the visualization. Various facilities are also
enabled to allow the user to colour features of the image.
Experiments have also been made with the online use of virtual reality (VRML)
displays in three dimensions for these datasets (Using
VRML for an Overview of World Problems), but these have not been yet
been enabled for the Questions data. Experiments have been made with the online
export of such data into third party packages such as Decision
Explorer -- a proven tool for managing "soft" issues, namely the qualitative
information that surrounds complex or uncertain situations. Again this has
not yet been enabled for Questions data, for which it could be very appropriate.
As suggested above, however, some offline experiments with the proprietary Netmap package
have been successfully used in a preliminary exploration and visualization
of portions of the Questions dataset.
Possibilities from this approach
This work was inspired at an early stage by that of Stafford Beer, Syd Howell,
Alan Mossman, and Gordon Pask who developed a set of techniques on the occasion
of a conference on Improving the Human Condition: Quality and Stability
in Social Systems (Silver Anniversary International Meeting, London, 1979)
of the Society for General Systems Research (SGSR). The resulting documents,
tables and maps were presented in Metaconferencing
possibilities: Discovering people / viewpoint networks in conferences (1980).
Of particular relevance is their early use of a question-statement refinement
technique and mapping of the results as applied to an international conference
involving people well-disposed towards such techniques
A particular exploration touching on WH-questions was made in various interrelated papers prior to this experiment:
Some interesting work could be done to refine the WH-question templates and to explore their functional relationships (as suggested in the above papers)
The hierarchical linkages between questions would provide a very interesting
technique for moving to more generic questions or "drilling" down
to more specific questions
The functional links offer an interesting possibility for exploring learning pathways based on questions. In this context the detection and exploration of loops of questions (using network analysis techniques) raise many points of interest (cf the work of Ron Atkin on simplicial complexes)
Additional features might include:
- A facility for users to impose their own preferred template on the seed entities
- It is possible that it would be relatively easy to apply automatic translation
techniques to such questions to obtain linguistic variants -- raising issues
of the the correspondence to WH-questions in other languages and the possibility
of other forms of question not envisaged in this excperiment.
Once the formatting is clarified, the issue is whether the database in its
entirety lends itself to interesting analysis, notably with the visualization
and other tools (already offered as options to the online presentation of search
results). Some thoughts are:
- Since we are dealing with questions, does the network of linkages
make sense as a learning pathway anyway (Broader, Narrower, etc)?
- Can
the questions be clustered in interesting ways, given that both the Problems
and Strategies were analyzed to discover interesting loops?
- Are the links
between questions from different databases meaningful?
- What would it take to move beyond, or filter out, those questions that
could be identified as more contrived or artificial?
- How does this kind of experiment help to focus attention on "better" questions?
- To what extent does
the pattern of questions (with the templates used) usefully covers the forms
of interrogation with regard to XXXX?
- Given the artificial templates used to generate the questions, is any meaning mainly to be derived at more abstract levels of analysis?
- Given the recurring significance to contemporary challenges of governance
of "problems", "strategies" and "values" -- from which the "questions" were
generated -- is there further significance to be obtained from any explicit
links to the "organizations" and "meetings" where their understanding and
importance are clarified? Are "questions" then to be usefully considered
as the underlying focus of "organizations", but especially of "meetings"?
Is there a case for recognizing that it is not new "answers" to old "problems"
which is the fundamental challenge, but rather new "questions" with respect
to those old "problems"?
Question of significance?
The key question with any visualization of a pattern of information, such
as that enabled by this experiment, is whether it offers any new insight. Clearly
unusual patterns of information can be generated, but do they lead to unusual
insights?
The visualization metaphors used have the advantage of holding disparate questions
in relationship to one another in a manner which suggests the possibility of
a more integrative perspective. Interacting with the various visualization
options may enhance that possibility. At issue are the conditions under which
such a perspective might emerge as more than an artifice of the design metaphor.
More fundamental is the issue of what forms of visualization enable new insight
and how they may compare with those explored with respect to these patterns
of relationship between questions.
Of particular interest is the
extent to which the patterns of questions may be compared with those characteristic
of stages in learning pathways and individuation processes (cf George Siemens, Connectivism:
Learning as Network-Creation, 2005).
|