Challenges to Comprehension Implied by the Logo
of Laetus in Praesens
Laetus in Praesens Alternative view of segmented documents via Kairos

18 July 2012 | Not completed

Visualizing Latent Significance in Patterns of Relationships

a case study in relative incompetence

-- / --


Introduction

This provides a brief record of a range of initiatives to map relationships between international organizations, world problems, global strategies, and related entities. It covers a period dating back to the early 1970s using a variety of computer-enabled techniques including, network analysis, scalable vector graphics, spring maps, virtual reality, and other possibilities. The purpose has been to shift the level of debate to enable discussion of patterns rather than isolated concerns and possibilities. Some of the results of these initiatives have been made available through an interactive online facility ****

The concern at this time is to review the possibility of eliciting a new level of meaning from the set of documents written by the author over that same period. These have notably addressed the wider strategic significance of the entities profiled in that period. It is of course the case that the range of computer-enabled techniques purportedly able to facilitate such an exercise has increased considerably, especially in recent years.

The review is written from the perspective of someone who has made extensive use of any technical facilities available, and with a degree of success -- using a modest level of programming capacity. The question addressed is how these might be fruitfully applied to extract any latent meaning in the documents written during that process.

With the multiplication of web search facilities, the wider concern is how insight is to be extracted from sets of documents amongst which there are presumably significant patterns of relationships. How is that pattern to be comprehended? Why are such facilities less readily available than might be assumed?

Potentially more important is why there is relatively little interest Club of Rome

web maps information global sensemaking

Summary of past initiatives

The following indications are provided in order to give a sense of the context within which the quest for higher orders of meaning, "buried" in sets of documents, is pursued -- and described below. It also gives a sense of the competence available in that quest.

A general checklist of documents written soecifically with regard to the matter is provided at: Visualization, Presentation, Mapping

The most recent experiments have focused on the configuration of patterns of catrgorirs and insights using polyhedra structures, as introduced in Polyhedral Pattern Language Software facilitation of emergence, representation and transformation of psycho-social organization (2008)

A summary of previous approaches to visualization is provided in: Experimental Visualization of Networks: world problems, international organizations, global strategies and human values (2007). This points to the following:

Possibilities of knowledge mapping and associated experiments were reviewed in:

Earlier framings of the challenge appeared as:

Current approaches to derivation of significance from documents

The following approaches have been considered in relation to the challenge of extracting significance from a set of 2000 documents held on the Laetus in Praesens website. A network of 5000 citation links exists amongst that set of documents. The documents cite 4000 external authors. The documents have been segmented, given their size (up to 150k), resulting in a larger set of 9000 documents. These are held in an associated content management system (Kairos).

Possibilities for extraction of significance include:

Framing the possibility of cluster analysis

One of the most readily available analytical techniques in statistical packages is cluster analysis (one of the techniques of data mining). In considering its use, and in the spirit of this review, the following can usefully be borne in mind:

Accessible statistical clustering applications

Following various recommendations and criteria, the following packages were considered.

Cluster: An open source clustering package, most notably used to analyze gene expression data. The output can be notably visualized using Java TreeView, the open source, extensible viewer for microarray data in the PCL or CDT format.

Gephi: This is an open source, interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.

Tulip: This is an information visualization framework dedicated to the analysis and visualization of relational data. It aims to provide the developer with a complete library, supporting the design of interactive information visualization applications for relational data that can be tailored to the problems  addressed.

MicrOsiris Statistical Analysis and Data Management Software:

R (programming language): R is an open source programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software and data analysis. The capabilities of R are extended through user-created packages, which allow specialized statistical techniques, graphical devices, import/export capabilities, reporting tools, etc. These packages are developed primarily in R,.These notably include a package for Latent Semantic Analysis which can be freely downloaded

Latent Semantic Analysis (LSA): There are many web resources for LSA.

Please find enclosed a quick analysis in Matlab of your data. Matlab is not important, it is just that there happens to be the script corresponding to Probabilistic Latent Semantic Analysis (PLSA), which I believe matches very well what you want: discover patterns of authors and at the same time see which patterns are present in the citations of your documents. Should you like to adapt this in R (Matlab is not free, I used the one at work), I think it will be very easy to translate the script in R, since the syntaxes are very close, see cran.r-project.org/doc/contrib/Hiebeler-matlabR.pdf and the algorithm is quite simple in fact. I think you could find with a little Googling versions of PLSA in C++ or similar languages for you to use. The link for the Matlab script is in the attached file. I know that LSA/PLSA normally use matrices of rows = documents and columns = words and that in your case the columns are authors, however it is not a problem, in fact PLSA simply associates two variables - it has even been used even on image data if I remember well. The point is that with PLSA you will be able to identify patterns of authors (where authors can belong to several patterns) and see what patterns are associated to a document (a document could have one or more patterns in the citations).

Open Office: The Calc portion of the open-source Open Office suite does not itself offer cluster analysis as an adjunct to its spreadsheet facility. However, there appear to be a range of third party packages which can interface with it in some way, some of which are freeware, others offer a free trial, and others are simply proprietary.

VISta: the visual statics system: This features t statistical visualizations that are highly dynamic and very interactive.

A set of test data was prepared as .TXT or .CSV file -- a simple table of cited authors against papers citing. The program written for the purpose was the DOS-based Advanced Revelation running in a DOS window on Windows 7. The data table was extracted from tables used to in the process of converting .PHP documents on the Laetus in Praesens website for import into the Drupal CMS.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

For further updates on this site, subscribe here