Determining Functional Load using SLIP

by, Paul Prueitt, PhD

Founder, (1997) BCNGroup.org

President, OntologyStream Inc.

Draft: December 27, 2001

The concept of functional load is addressed at some length in John Lyons’ Book “Introduction to Theoretical Linguistics” (Cambridge University Press, 1968). In Lyons’ book, the notion of functional load is treated as a cause of the distribution of basic compositional elements related to spoken and written expression. This notion is a part of the tradition in theoretical linguistics and follows the work of de Saussure (Course in General Linguistics, Payot, 1916) and Z. Harris (Methods of Structural Linguistics, Univ. Chicago Press, 1951).

In essence, the notion is that sounds that are easy to make will be used in situations where ambiguity of expression has some penalty. So a basic investigation, on auditory and acoustic phonetics, leads to an understanding of how language is used and evolves. Auditory and acoustical coherence and discordance is reflected in the structure and form of natural language. The investigation leads to partial knowledge when the target of investigation is a complex system.

My background is not so strong on theoretical linguistics that I cannot proceed right away into a discussion of phonetics and grammar and make comparisons between the internal structures of various language groups. I will leave this to others. In any case, this is not the primary purpose of the OntologyStream Inc (OSI) Browsers. However, we suggest that these browsers can be used as teaching tools and investigation tools related to distributional analysis in general terms. In this way, distributional analysis leads to the notion of event chemistry even if the target of investigation is not linguistically focused. The event chemistry has to have:

1) A theory of substructure (illustrated by the making and hearing of sounds in Lyons’ perspective on functional load)

2) Laws or rules of assembly, even if these laws are distributed and not precise

3) Expectancy

Natural language can be considered as a complex system that is in fact stratified into organizational levels related to how language is remembered, how the human body generates cognition and language expression, and how natural language is reinforced within a social system. But natural language is only one example of a complex system.

OSI Browsers are being developed to study the following complex systems:

1) Telecommunications systems

2) Systems of financial transactions

3) Activities of the Patent and Trademark Office

4) Virtual discussion in electronic forums

5) Hacker activity in the Internet

Clearly Natural Language Processing will provide some tools, particularly with #3 and #4, where it is essential that concepts and themes expressed in natural language is a significant part of the study.

But NLP work presents us with a dilemma. This dilemma is well stated by Lyons at the end of his second Chapter:

“Two apparently contradictory principles has been maintained in this section: first, that statistical considerations are essential to an understanding of the operation and development of languages; second, that it is in practice (and perhaps also in principle) impossible to calculate precisely the information carried by linguistics units in actual utterances. This apparent contradiction is resolved by recognizing that linguistic theory, at the present time at least, is not, and cannot, be concerned with the production and understanding of utterances in their actual situations of use (except for a relatively small class of language-utterances which can be handled directly in this way), but with the structure of sentence considered in abstraction from the situations in which actual utterances occur.” – page 98.

The link analysis made using the Warehouse Browser provides a weak measure of functional load when the induced metric is distributed and used to control an emergent computing process in the scatter-gather in the SLIP Technology Browser. A stronger measure of functional load is possible only as domain experts begin to develop specific understanding of the event atoms and the patterns of co-occurrence that is seen in practice. This encoding can be facilitated through knowledge management principles.

What one expects is the development of a type of periodic table of elementary event atoms. The development of this table is based on experimental results involving the derived relationships between occurrences of atoms in event logs. In computer intrusion work, these atoms might be IP addresses or port values. In text understanding, the atoms may be co-occurrences of words in paragraphs or other text units.

Figure 1: Atoms from one of the SLIP categories

In Figure 1 we show the event atoms for a category derived from a quick study of the functional loads of the Aesop collection. As in Latent Semantic Indexing (LSI), we focus on a relationship between the membership of an “internal token” and a profile of the larger group. In this case the token is a member of a set of referent tokens (a very simple, and hand made, dictionary) and the larger group is the collection of individual fables.

Table 1: the data set for Figure 1

token name

begged 260

protected 260

placed 222

fox 222

house 222

sailing 232

keep 232

inquired 232

see 232

enemies 232

storm 232

danger 232

ends 232

enemy 232

seeing 133

get 133

another 133

passing 133

heard 133

inquired 133

happened 133

get 133

fox 133

meat 133

shepherds 133

fate 133

fox 133

cries 133

friend 133

tradesmen 112

called 112

protecting 112

proposed 112

method 112

stood 112

enemy 112

preferable 112

defense 112

striving 176

led 176

manage 176

save 176

calf 176

offered 176

argued 200

hares 200

lions 200

hares 200

assembly 200

lions 200

words 200

hares 200

teeth 200

lay 236

appeared 236

dogs 236

house 236

dog 236

house 236

summer 236

house 236

addressed 170

freedom 170

put 170

eat 170

give 170

favorably 170

wolves 170

mind 170

brothers 170

slave 170

bones 170

dogs 170

proposals 170

wolves 170

fell 121

allowing 121

share 21

fellows 121

fell 121

milk-woman 121

farmers 121

milk 121

field 121

money 121

milk 121

end 121

money 121

fellows 121

moment 121

milk 121

ground 121

schemes 121

Looking at Figure 1 we see that atom 260 (one of the fables) has only two valances, and atoms 222 has three. By “valance” we mean here that the Analytic Conjecture has established an inference regarding how fables are related to each other via the Dictionary of tokens. This fact is reflected in Table 1.

Looking at concept linkage and functional load

In the simple exercise, to follow, we look at the concept linkage between the elements of text in a collection. The concepts are weakly represented by a collection of nouns and verbs that have been extracted from the fable collection. Functional load is to be identified through what ever means we can. The first step towards obtaining a validated theory on the functional load in the fable collection is to build a first approximation using the co-occurrence between individual fables and a unified list of nouns and verbs (called the dictionary). A datawh.txt file was produced for this purpose in the previous exercise.

A deeper study of functional load related to the fable collection can be made. One could, for example, parse the fables and identifying when a noun and a verb from the Dictionary were both contained in the fable. The events reported out to a new datawh.txt would then have the form

( noun, verb, fable name )

Then the analytic conjecture could be done using nouns as the “a” value and verbs as the “b” value. But in this Exercise, we keep things very simple for illustration purposes.

Table one is the Report generated by the Technology Browser for category ‘R-D-level’ (residue at the D level).

a b

Figure 2: The Analytic Conjecture for tokens, and a SLIP Framework

In working with this exercise, we have two resources. First the collection of fables are posted one URL at a time using the sting:

(www.) + ontologystream.com/IRRTest/fables/BEAD(N).HTM

where “(N)” is replaced (manually) by the atom number. We have checked a few of these, but not all.

The idea, with the fable collection, has been to prototype a BeadGame Communities software system based on work considered and made over a period of a decade by the BCNGroup.org Foundation. However, this long-term goal requires that our group make economic gains first. These gains will come from the application of the NLP Browser to the analysis of patent information and stockholder reports.

So we use the fable collection as a means to illustrate what we have by way of technology and where there are still software development issues that need to be solved.

The reader is now asked to download this exercise’s zip file. The zip file for this exercise is TAI.zip. (289 K zipped, including the three browsers and a data set.). The code is Visual Basic developed by OntologyStream.com.

After opening the Warehouse Browser (SLIPWhse.1.2.0.exe), one will see Figure 2a. Opening the Technology Browser (SLIP.2.2.3.exe) will show the user an computer interface that looks like Figure 2b.

The tree like structure, called the SLIP framework, seen in Figure 2b is developed by taking the elements of the category A1 and randomly scattering these elements (called SLIP atoms) to the circle. One can review the numerous exercises to review how these SLIP atoms are developed. The SLIP atoms are the “a” values from the analytic conjecture that where found (by the Technology Browser) to have been paired with a “b” value. In this case, the SLIP atoms are names of fables (labeled by the fable number).

So let us see how the SLIP atoms are created.

If you have not already done this, download TAI.zip and unzip into an empty folder. You may remove and delete the contents of the Data folder except the single file datawh.txt.

Figure 3: TAI.zip unzipped

Then click on SLIPWhse.1.2.0.exe . Enter the commands, “a = 1” and then “b = 0”. Enter the commands “pull”, followed by the command “export”.

You can then look into the Data folder and see that several new files exist, one of which is Paired.txt (used by the Technology Browser) and Links (used by the Event Browser.)

Now click on SLIP.2.3.3.exe. The Technology Browser starts with an empty A1 node in the topic graph window. Enter the commands “import” and then “extract”. One can type in “help” to find out more about these functions. Then click on the A1 node.

The development of the SLIP Framework can be automated. But currently we insist on the user developing the Framework by visually finding areas of interest by looking at the emergent clustering that occurs on the circle. Enter the command “random’ and then the command “cluster”.

The cluster command will produce 100,000 iterations of a seek function. For this data, with this analytic conjecture, this is too much iteration. Enter the command “random” and then “cluster 2”. Entering a return should add 2,000 iterations each time. By the time you get to 12,000 iterations your gather process should look something like Figure 4b.

a b

Figure 4: Clustering the top node

We are looking for a small group of atoms. A small group helps we point out specific features of the event chemistry related to that small group. We may take the middle out of the large distribution and look into the complement. To do this you may type in a bracket command “x, y -> B1”, where x and y bracket the interior of the large distribution. Then click on the A1 node and type “residue” to put the complement of B1 into a second category.

Now click on the residue category, labeled “R”, and type random. Without clustering, take 90 degrees (any 90 degrees) and put the atoms into the category C1. To this by typing, for example, “45, 135 -> C1”. Click on the new category. If you type “cluster” you will most likely see the quick formation of several groups that do not move together.

Choose the largest of these (hopefully you will have at least two and less than five elements in this group.) If not, then go to the data folder and delete subfolders of A! and type in load into the command line. This will allow you to start over.

Once you have a category with two or three or four or five elements in the category, then type in “key = 1”, click on the Report button and type generate in the command line.

Figure 5: the Report for a category of two atoms.

You will have different results than I. However, you have captured two – five atoms that

1) Are all connected by the co-occurrence of one or more token and

2) Have an average number of valances that connect to atoms outside the category.

The two fables that I have identified here are (191) The Jackdaw and the Doves, and (113) The Master and His Dogs . The linkage is via the token “food”.

Table 2: The valances of 191 and 113

token name

seeing 191

painted 191

joined 191

share 191

discovered 191

recognizing 191

desiring 191

jackdaw 191

food 191

day 191

character 191

food 191

jackdaws 191

ends 191

killed 113

obliged 113

seeing 113

took 113

own 113

master 113

dogs 113

storm 113

country 113

house 113

goats 113

household 113

storm 113

yoke 113

oxen 113

food 113

dogs 113

counsel 113

time 113

master 113

oxen 113

friend 113