Diagrams illustrating the disambiguation issue
Ontology Stream Inc 9/21/03
Software at:
www.ontologystream.com/cA/tutorials/disambiguation.zip
In the detection of memetic expression one often develops an assumption of relationship based on various means. The results of algorithms is often delivered in terms of a set of ordered triples
{ < a, r, b > }
where a and b are words or phrases and r is an non-specific relationship.
For example:
Figure 1: Two sets of relationship that have the same subject indicator, “bush”
subject(i) subject(j)
bush(1) relationship1
leader relationship1
war relationship1
taxes relationship1
garden relationship1
subject(i) subject(j)
bush(2) relationship2
plant relationship2
ground relationship2
leaves relationship2
green relationship2
garden relationship2
The data in the above table has the form necessary for the SLIP browsers to produce the event chemistry from an aggregation of categorical invariance into categories.
The set of derived relationships are
{ <bush(2), r,
garden>, <bush(2), r, green>, <bush(2), r, ground>,<bush(2),
r, leaves>,
<bush(2), r, plant>
}
and
{ <bush(1), r, leader>, <bush(1), r, war>,
<bush(1), r, taxes>,<bush(1), r, garden> }
In Figure 2 we notice that the string “garden” is co-occurring in both the context of bush the president and bush the plant.
Figure 2: Intersection between the atoms of two categories
Figure 2 one can make the plausible inference that one occurrence of the term “garden” has subject indicator “the President’s rose garden”.
The information in Figure 1 is given in a slightly different form in Figure 3.
The method of disambiguation takes the ending nodes of relationship 1 and relationship 2 as the basis for using the Prueitt Voting Procedure:
http://www.bcngroup.org/area3/pprueitt/kmbook/Appendix.htm
The application of the Prueitt Voting Procedure allows a two-step process.
The first of which is a human sorting into bins of graph branches from measures of local linguistic variation using a word level n-gram measurement process. A human-on-the-loop is judged absolutely necessary for high quality memetic detection.
http://www.bcngroup.org/area2/KSF/HIP.htm
The second is an automous routing of branches being produced from a word level n-gram measurement process into categories that correspond to subject indicators that the bins have been developed for.