Diagrams illustrating the disambiguation issue

Ontology Stream Inc 9/21/03

Software at:

www.ontologystream.com/cA/tutorials/disambiguation.zip

In the detection of memetic expression one often develops an assumption of relationship based on various means. The results of algorithms is often delivered in terms of a set of ordered triples

{ < a, r, b > }

where a and b are words or phrases and r is an non-specific relationship.

For example:

Figure 1: Two sets of relationship that have the same subject indicator, “bush”

subject(i) subject(j)

bush(1) relationship1

leader relationship1

war relationship1

taxes relationship1

garden relationship1

subject(i) subject(j)

bush(2) relationship2

plant relationship2

ground relationship2

leaves relationship2

green relationship2

garden relationship2

The data in the above table has the form necessary for the SLIP browsers to produce the event chemistry from an aggregation of categorical invariance into categories.

The set of derived relationships are

{ <bush(2), r, garden>, <bush(2), r, green>, <bush(2), r, ground>,<bush(2), r, leaves>,

<bush(2), r, plant> }

and

{ <bush(1), r, leader>, <bush(1), r, war>, <bush(1), r, taxes>,<bush(1), r, garden> }

In Figure 2 we notice that the string “garden” is co-occurring in both the context of bush the president and bush the plant.

Figure 2: Intersection between the atoms of two categories

Figure 2 one can make the plausible inference that one occurrence of the term “garden” has subject indicator “the President’s rose garden”.

The information in Figure 1 is given in a slightly different form in Figure 3.

The method of disambiguation takes the ending nodes of relationship 1 and relationship 2 as the basis for using the Prueitt Voting Procedure:

http://www.bcngroup.org/area3/pprueitt/kmbook/Appendix.htm

The application of the Prueitt Voting Procedure allows a two-step process.

The first of which is a human sorting into bins of graph branches from measures of local linguistic variation using a word level n-gram measurement process. A human-on-the-loop is judged absolutely necessary for high quality memetic detection.

http://www.bcngroup.org/area2/KSF/HIP.htm

The second is an automous routing of branches being produced from a word level n-gram measurement process into categories that correspond to subject indicators that the bins have been developed for.