Enlarged version: challenges to comprehension
Home/Search
Articles  >>
Themes  >>
Visuals  >>
Context  >>
FAQ/Contact  >>

Joy in the Present
      

30th October 2006 | Draft

Generating a Million Questions from UIA Databases

Problems, Strategies, Values

- / -


Also published in modified form in Statistics, Visualizations and Patterns (Vol 5 of the Yearbook of International Organizations, Munich, K G Saur Verlag, 6th edition, 2006/2007, as sections 10.1.1 and 10.1.2)

Background

The experiment described below follows from an initial interest of the German Research Centre for Artificial Intelligence (DFKI), in support of the questions project of the international nonprofit organization Dropping Knowledge – as clarified during a workshop on the online databases of the Union of International Associations (Saarbrucken, 8 December 2005). Dropping Knowledge subsequently appropirated this information as the basis for establishing an online web facility to enable people worldwide to ask questions and to be exposed to answers -- thereby creating a "Living Library". The categorization of the questions was undertaken using the ontology developed by the UIA (cf Enabling a Living Library, 2006)

The concern here, in contrast, is whether it was possible to generate a Questions database from three long-established UIA databases: World Problems-Issues, Global Strategies-Solutions, and Human Values. These databases are part of the online Encyclopedia of World Problems and Human Potential, originally initiated in collaboration with Mankind 2000, whose development was most recently funded by the European Commission. The databases are integrated with others on international organizations, international meetings, biographies and bibliographies (cf Yearbook of International Organizations, International Congress Calendar).

The thousands of problems, strategies and values identified from the documented preoccupations of the network of international organizations (governmental and nongovernmental) provide a relatively objective focus for the generation of questions associated with those preoccupations -- or implicit in them. Clearly a particular interest in this experiment is to determine in what ways the result of generating questions could be meaningful and significant. The work builds on the possibilities of the use of such databases for simulations (cf Simulating a Global Brain: using networks of international organizations, world problems, strategies, and values, 2001).

WH-questions

There is an extensive literature on what are termed “WH-questions”. “WH-questions” refer to questions of the type: How? Why? Where? What. Which? When? Who? Further comments on studies in relation to such questions are noted below.

The Questions database was first generated experimentally in December 2005, and then more comprehensively in October 2006. In each case by applying a template of the WH-questions to the titles of Problems, Strategies and Values.. This can be done by embedding the "seed title" (XXXX) in a suitable template phrase. For example::

  • How is XXXX caused?
  • Who is responsible for YYYY?
  • Where does XXX occur?
  • When does XXXX occur?
  • What is XXXX?

The range of templates is illustrated by the following table

Templates used experimentally to generate questions
Source database Generated query (** = not finally used)
. WH-query Phrase-1 Seed Phrase2
World
Problems-Issues
(13 templates
applied)
How much XXXX is there?
How does XXXX happen? (**)
How is XXXX caused? (**)
Why does XXXX happen?
Why give priority to XXXX ?
Why does God allow XXXX to happen? (**)
Why be concerned by XXXX ? (**)
Where does XXXX occur?
What is XXXX ?
What causes XXXX ?
What results in XXXX ? (**)
Who causes XXXX ?
Who is responsible for XXXX ?
Who is concerned about XXXX ?
When does XXXX occur?
When will XXXX occur?
When did XXXX arise?
Which kind of XXXX ?
Global
Strategies-Solutions
(14 templates
applied)
How can XXXX be enabled?
Why is XXXX unsuccessful?
Why give priority to XXXX ?
Where is XXXX undertaken?
Where is XXXX successful?
What is required for XXXX ?
What causes XXXX to fail?
Who undertakes XXXX ?
Who is responsible for XXXX ?
Who is concerned about XXXX ?
When is XXXX undertaken?
When will be XXXX undertaken?
When was XXXX undertaken?
Which kind of XXXX ?
Human Values
(constructive
or destructive --
9 templates
applied )
How is XXXX elicited?
Why is XXXX valued?
Why give priority to XXXX ?
Where is XXXX found?
What is XXXX ?
Who exemplifies XXXX ?
Who values XXXX ?
When is XXXX evident?
Which kind of XXXX ?
Human Values
(polarities --
10 templates
applied )
How are XXXX related?
How can XXXX be reconciled?
How can XXXX be transcended?
Why is the XXXX relation so challenging?
Where are XXXX reconciled?
What transcends XXXX ?
Who embodies XXXX ?
Who exemplifies the XXXX ambiguity?
When are XXXX transcended?
Which kind of XXXX relationship?

As is clear from the table above, different templates were used both according to the source database and according to the WH-Question. Since many Problems and Strategies have a number of alternative titles (notably employing synonyms), these too have been used as seeds for the generation of alternative titles for a question -- effectively constituting alternative formulations of the same question (but clustered together in the Question entry). Although they may be accessed through their keywords, they are not treated as distinctly profiled questions.

Preliminary results

The very preliminary results in generating these questions in December 2005 are indicated in the following table.

Preliminary results (December 2005)
. Seed
entities
WH-templates used Questions
generated

 

Total

Selected

 

Main

WH-Variants

Problems

59205

12995

13

168935

239252

Strategies

42032

12848

14

179872

167426

Values

3257

3209

9 /10

29111

16470

Totals

104494

29052

377918

423148

 

801066

 

Indication of distribution of seed entities by type

 

Problems-Issues

Strategies-Solutions

Profiles

Links

Profiles

Links

1996

2000

%

1996

2000

%

1996

2000

%

1996

2000

%

A

0

196

n.a.

0

3,507

n.a.

0

1,518

n.a.

0

16,767

n.a.

B

170

187

10%

5,300

7,090

34%

158

154

-3%

3,697

4,253

15%

C

575

722

26%

13,816

19,347

40%

1,100

1,089

-1%

17,096

25,206

47%

D

2,162

2,740

27%

30,613

52,451

71%

3,315

3,452

4%

19,374

43,329

124%

E

3,857

5,378

39%

29,626

52,587

78%

3,008

5,298

76%

11,092

50,677

357%

F

3,072

3,917

28%

38,625

61,604

59%

1,382

1,972

43%

7,015

19,580

179%

G

2,153

30,279

1306%

5,979

47,112

688%

7,685

13,107

71%

3,604

69,059

1,816%

Other

214

12,716

5,842%

905

26,255

2,801%

12,850

6,105

-52%

61,129

34,070

-44%

Total

12,203

56,135

360%

124,864

269,953

116%

29,498

32,695

11%

123,007

262,941

114%

 

Indication of seed entity relationships

 .

Hierarchical links

Functional links

 

 .

Broader

Narrower

Related

Aggravating

Aggravated  by

Reducing

Reduced by

Problems

26403

35500

14264

31024

31105

1507

1529

Strategies

27134

32541

3010

3302

2902

17826

16911

Values

11392

Totals

The subsequent generation of the Questions in October 2006 gave the following results:

Final results of question generation (October 2005)
.

Problems

Strategies

Values

Value-Polarities

Totals

.

Main

WH-
Variants

Main

WH-
Variants

Main

WH-
Variants

Main

WH-Variants

Main

WH-Variants

All

Seed entities

45892

31055

2978

229

80154

WH-templates

7 6 7 7 7 2 7 3 28 18 46

Main

321244 - 217385 - 20846 - 1603 - 561078 - 561078

(Alternative titles)

133917 114786 107422 107422 9807 2802 - - - - -

WH-Variants

- 275352 - 217385 - 5956 - 687 - 499380 499380

Total questions

321244 275352 217385 217385 20846 5956 1603 687 561078 499380 1060458

Broader

433244 371352 370174 370174 79765 22790 0 687 883183 765003 1648186

Narrower

346577 297066 284396 284396 0 0 79744 34176 710717