eventChemistry ™ .
Determining
Functional Load using SLIP
by, Paul Prueitt, PhD
Founder, (1997) BCNGroup.org
President, OntologyStream
Inc.
Draft: December 27, 2001
The concept of functional load is addressed at some length in John Lyons’ Book “Introduction to Theoretical Linguistics” (Cambridge University Press, 1968). In Lyons’ book, the notion of functional load is treated as a cause of the distribution of basic compositional elements related to spoken and written expression. This notion is a part of the tradition in theoretical linguistics and follows the work of de Saussure (Course in General Linguistics, Payot, 1916) and Z. Harris (Methods of Structural Linguistics, Univ. Chicago Press, 1951).
In essence, the notion is that sounds that are easy to make will be used in situations where ambiguity of expression has some penalty. So a basic investigation, on auditory and acoustic phonetics, leads to an understanding of how language is used and evolves. Auditory and acoustical coherence and discordance is reflected in the structure and form of natural language. The investigation leads to partial knowledge when the target of investigation is a complex system.
My background is not so strong on theoretical linguistics that I cannot proceed right away into a discussion of phonetics and grammar and make comparisons between the internal structures of various language groups. I will leave this to others. In any case, this is not the primary purpose of the OntologyStream Inc (OSI) Browsers. However, we suggest that these browsers can be used as teaching tools and investigation tools related to distributional analysis in general terms. In this way, distributional analysis leads to the notion of event chemistry even if the target of investigation is not linguistically focused. The event chemistry has to have:
1) A theory of substructure (illustrated by the making and hearing of sounds in Lyons’ perspective on functional load)
2) Laws or rules of assembly, even if these laws are distributed and not precise
3) Expectancy
Natural language can be considered as a complex system that is in fact stratified into organizational levels related to how language is remembered, how the human body generates cognition and language expression, and how natural language is reinforced within a social system. But natural language is only one example of a complex system.
OSI Browsers are being developed to study the following complex systems:
Clearly Natural Language Processing will provide some tools, particularly with #3 and #4, where it is essential that concepts and themes expressed in natural language is a significant part of the study.
But NLP work presents us with a dilemma. This dilemma is well stated by Lyons at the end of his second Chapter:
“Two apparently contradictory principles has been maintained in this section: first, that statistical considerations are essential to an understanding of the operation and development of languages; second, that it is in practice (and perhaps also in principle) impossible to calculate precisely the information carried by linguistics units in actual utterances. This apparent contradiction is resolved by recognizing that linguistic theory, at the present time at least, is not, and cannot, be concerned with the production and understanding of utterances in their actual situations of use (except for a relatively small class of language-utterances which can be handled directly in this way), but with the structure of sentence considered in abstraction from the situations in which actual utterances occur.” – page 98.
The link analysis made using the Warehouse Browser provides a weak measure of functional load when the induced metric is distributed and used to control an emergent computing process in the scatter-gather in the SLIP Technology Browser. A stronger measure of functional load is possible only as domain experts begin to develop specific understanding of the event atoms and the patterns of co-occurrence that is seen in practice. This encoding can be facilitated through knowledge management principles.
What one expects is the development of a type of periodic table of elementary event atoms. The development of this table is based on experimental results involving the derived relationships between occurrences of atoms in event logs. In computer intrusion work, these atoms might be IP addresses or port values. In text understanding, the atoms may be co-occurrences of words in paragraphs or other text units.
Figure 1: Atoms from one of the SLIP categories
In Figure 1 we show the event atoms for a category derived from a quick study of the functional loads of the Aesop collection. As in Latent Semantic Indexing (LSI), we focus on a relationship between the membership of an “internal token” and a profile of the larger group. In this case the token is a member of a set of referent tokens (a very simple, and hand made, dictionary) and the larger group is the collection of individual fables.
Table 1: the data set for Figure 1
token name
begged 260
protected 260
placed 222
fox 222
fox 222
house 222
sailing 232
keep 232
inquired 232
see 232
enemies 232
enemies 232
storm 232
danger 232
ends 232
enemy 232
seeing 133
get 133
another 133
passing 133
heard 133
inquired 133
happened 133
get 133
fox 133
fox 133
meat 133
shepherds 133
fate 133
fox 133
cries 133
friend 133
tradesmen 112
called 112
protecting 112
proposed 112
method 112
stood 112
enemy 112
preferable 112
defense 112
striving 176
led 176
manage 176
save 176
calf 176
calf 176
offered 176
argued 200
hares 200
lions 200
hares 200
assembly 200
lions 200
words 200
hares 200
teeth 200
lay 236
appeared 236
dogs 236
house 236
dog 236
house 236
summer 236
house 236
addressed 170
freedom 170
put 170
eat 170
give 170
favorably 170
wolves 170
wolves 170
mind 170
brothers 170
slave 170
bones 170
dogs 170
proposals 170
wolves 170
fell 121
allowing 121
share 21
fellows 121
fell 121
milk-woman 121
farmers 121
milk 121
field 121
money 121
milk 121
end 121
money 121
fellows 121
moment 121
milk 121
ground 121
schemes 121
Looking at Figure 1 we see that atom 260 (one of the fables) has only two valances, and atoms 222 has three. By “valance” we mean here that the Analytic Conjecture has established an inference regarding how fables are related to each other via the Dictionary of tokens. This fact is reflected in Table 1.
Looking at concept linkage and functional load
In the simple exercise, to follow, we look at the concept linkage between the elements of text in a collection. The concepts are weakly represented by a collection of nouns and verbs that have been extracted from the fable collection. Functional load is to be identified through what ever means we can. The first step towards obtaining a validated theory on the functional load in the fable collection is to build a first approximation using the co-occurrence between individual fables and a unified list of nouns and verbs (called the dictionary). A datawh.txt file was produced for this purpose in the previous exercise.
A deeper study of functional load related to the fable collection can be made. One could, for example, parse the fables and identifying when a noun and a verb from the Dictionary were both contained in the fable. The events reported out to a new datawh.txt would then have the form
( noun, verb, fable name )
Then the analytic conjecture could be done using nouns as the “a” value and verbs as the “b” value. But in this Exercise, we keep things very simple for illustration purposes.
Table one is the Report generated by the Technology Browser for category ‘R-D-level’ (residue at the D level).
a b
Figure 2: The Analytic Conjecture for tokens, and a SLIP Framework
In working with this exercise, we have two resources. First the collection of fables are posted one URL at a time using the sting:
(www.) + ontologystream.com/IRRTest/fables/BEAD(N).HTM
where “(N)” is replaced (manually) by the atom number. We have checked a few of these, but not all.
The idea, with the fable collection, has been to prototype a BeadGame Communities software system based on work considered and made over a period of a decade by the BCNGroup.org Foundation. However, this long-term goal requires that our group make economic gains first. These gains will come from the application of the NLP Browser to the analysis of patent information and stockholder reports.
So we use the fable collection as a means to illustrate what we have by way of technology and where there are still software development issues that need to be solved.
The reader is now asked to download this exercise’s zip file. The zip file for this exercise is TAI.zip. (289 K zipped, including the three browsers and a data set.). The code is Visual Basic developed by OntologyStream.com.
After opening the Warehouse Browser (SLIPWhse.1.2.0.exe), one will see Figure 2a. Opening the Technology Browser (SLIP.2.2.3.exe) will show the user an computer interface that looks like Figure 2b.
The tree like structure, called the SLIP framework, seen in Figure 2b is developed by taking the elements of the category A1 and randomly scattering these elements (called SLIP atoms) to the circle. One can review the numerous exercises to review how these SLIP atoms are developed. The SLIP atoms are the “a” values from the analytic conjecture that where found (by the Technology Browser) to have been paired with a “b” value. In this case, the SLIP atoms are names of fables (labeled by the fable number).
So let us see how the SLIP atoms are created.
If you have not already done this, download TAI.zip and unzip into an empty folder. You may remove and delete the contents of the Data folder except the single file datawh.txt.
Figure 3: TAI.zip unzipped
Then click on SLIPWhse.1.2.0.exe . Enter the commands, “a = 1” and then “b = 0”. Enter the commands “pull”, followed by the command “export”.
You can then look into the Data folder and see that several new files exist, one of which is Paired.txt (used by the Technology Browser) and Links (used by the Event Browser.)
Now click on SLIP.2.3.3.exe. The Technology Browser starts with an empty A1 node in the topic graph window. Enter the commands “import” and then “extract”. One can type in “help” to find out more about these functions. Then click on the A1 node.
The development of the SLIP Framework can be automated. But currently we insist on the user developing the Framework by visually finding areas of interest by looking at the emergent clustering that occurs on the circle. Enter the command “random’ and then the command “cluster”.
The cluster command will produce 100,000 iterations of a seek function. For this data, with this analytic conjecture, this is too much iteration. Enter the command “random” and then “cluster 2”. Entering a return should add 2,000 iterations each time. By the time you get to 12,000 iterations your gather process should look something like Figure 4b.
a b
Figure 4: Clustering the top node
We are looking for a small group of atoms. A small group helps we point out specific features of the event chemistry related to that small group. We may take the middle out of the large distribution and look into the complement. To do this you may type in a bracket command “x, y -> B1”, where x and y bracket the interior of the large distribution. Then click on the A1 node and type “residue” to put the complement of B1 into a second category.
Now click on the residue category, labeled “R”, and type random. Without clustering, take 90 degrees (any 90 degrees) and put the atoms into the category C1. To this by typing, for example, “45, 135 -> C1”. Click on the new category. If you type “cluster” you will most likely see the quick formation of several groups that do not move together.
Choose the largest of these (hopefully you will have at least two and less than five elements in this group.) If not, then go to the data folder and delete subfolders of A! and type in load into the command line. This will allow you to start over.
Once you have a category with two or three or four or five elements in the category, then type in “key = 1”, click on the Report button and type generate in the command line.
Figure 5: the Report for a category of two atoms.
You will have different results than I. However, you have captured two – five atoms that
1) Are all connected by the co-occurrence of one or more token and
2) Have an average number of valances that connect to atoms outside the category.
The two fables that I have identified here are (191) The Jackdaw and the Doves, and (113) The Master and His Dogs . The linkage is via the token “food”.
Table 2: The valances of 191 and 113
token name
seeing 191
painted 191
joined 191
share 191
discovered 191
recognizing 191
desiring 191
jackdaw 191
jackdaw 191
food 191
day 191
character 191
food 191
jackdaws 191
ends 191
killed 113
obliged 113
seeing 113
took 113
own 113
master 113
dogs 113
storm 113
country 113
house 113
goats 113
household 113
storm 113
yoke 113
oxen 113
food 113
dogs 113
counsel 113
time 113
master 113
oxen 113
friend 113
The Event Browser was used to see these two atoms (Figure 6).
The Event Browser is not yet completed, so we have to select the folder and the Members.txt to see the atoms.
a b
Figure 6: The selection of the Members file in the node and the event atoms
In Figure 6b we see 113 and 191. An examination of the valance file will show that these two atoms have no relationship to any of the other atoms in this sample.
Please send comments to Dr. Paul Prueitt.