(bridge) Send note to
Paul Prueitt .
(back
to SBF3)
We ask that
comments and discussion on this series be made in the Yahoo
groups from, eventChemistry.
APC1 – On
enumeration by the human of the cell values over an event class
APC2– Minimal
Voting Procedure (original paper)
APC3 – Using
the MVP to rout information
APC4 – Using
eventChemistry to improve the framework specification
Action Perception Cycles 2
Minimal
Voting Procedure
Directly from the Appendix, Foundations of Knowledge
Science
Description of the Minimal Voting Procedure (MVP)
First
Published, in Russia: 1997
(original notation)
Appendix: Description of the Minimal Voting Procedure (MVP)
To instantiate a voting procedure,
we need the following triple < C, O1, O2
> :
· A set of categories C = { Cq
} as defined by a training set O1.
· A means to produce a document
representational set for members of O1 .
· A means to produce a document
representational set for members of a test set, O2 .
We assume that we have a training
collection O1 with m document passages,
O1 = { d1 , d2 , . . . , dm }
Documents that are not single
passages can be substituted here. The notion introduced above can be
generalized to replace documents with a more abstract notion of an
"object".
Objects
O = { O1
, O2 , . . . , Om }
can be documents, semantic passages
that are discontinuously expressed in the text of documents, or other classes
of objects, such as electromagnetic events, or the coefficients of spectral
transforms.
Some representational procedure is
used to compute an "observation" Dr about the semantics of
the passages. The subscript r is used to remind us that various types of
observations are possible and that each of these may result in a different
representational set. For linguistic analysis, each observation produces a set
of theme phrases. We use the following notion to indicate this:
Dr
: di à { t1 , t2
, . . . , tn }
This notion is read "the
observation Dr of the passage di produces the
representational set { t1 , t2 , . . . , tn
}"
We now combine these passage level
representations to form a category representation.
·
each "observation", Dr,
of the passages in the training set O1 has a
"set" of theme phrases
Dr : di à Tk = { t1 , t2
, . . . , tn }
·
Let A be the union of the
individual passage representational sets Tk.
A = È Tk.
This set A is the
representation set for the complete training collection O1.
·
The overlap between category
representation Tq, and Ts, is one
statistical measure of the "cognitive entanglement" between category q
and category s. This fact will lead to a method for identifying the
minimal intersections of structural features of the category representational
sets.
J. S. Mill’s logics relies on the
discovery of meaningful subsets of representational elements. The first
principles of J S Mill’s argumentation are:
1. that negative evidence should be acquired as
well as positive evidence
2. that a bi-level argumentation should involve
a decomposition of passages and categories into a set of representational
phrases
3. that the comparison of passage and category
representation should generalize (provide the grounding for computational
induction) from the training set to the test set .
It is assumed that each "observation",
Dk, of the test set O2 is composed from a "set" of
basic elements, in this case the theme phrases in A. Subsets of the set
are composed, or aggregated, into wholes that are meaningful in a context that
depends only statistically on the characteristics of basic elements.
The general framework provides for
situational reasoning and computational argumentation about natural systems.
For the time being, it is assumed
that the set of basic elements is the full phrase representational set
A = È Tk.
for the training collection O1.
We introduce the notation for the
derived representational sets T*q for each C q , q
= 1, . . , n .
Given the data:
T*q for
each C q , q = 1, . . , n and
the representational sets Tk
, from the observations Dk, for each passage, dk,
from the test set O2, we generate the hypothesis that the observation Dk should
be categorized into category q.
This hypothesis will be voted on by
using each phrase in the representational set for Dk by making the
following inquiries for each element ti of the representational set Tk:
1. does an observation of a passage, Dk,
have the property p, where p is the property that this specific
representational element, ti , is also a member of the
representational set T*q for category q.
2. does an observation of a passage, Dk,
have the property p, where p is the property that this specific
representational element, ti , is not a member of the
representational set T*q for category q.
Truth of the first inquiry produces
a positive vote, from the single passage level representational element, that the
passage is in the category. Truth of the second inquiry produces a negative
vote, from the single representational element, that the passage is not
in the category. These votes are tallied.
Data structure for recording the
votes
For each passage, dk , we
define the matrix Ak as a rectangular matrix of size m x h where m
is the size of a specific passage representational set Tk,
and h is the number of categories. The passages are indexed by k, each passage
has it’s own matrix.
Each element ti of Tk,
will get to vote for or against the hypothesis that this kth passage should be
in the category having the category representational set T*q.
Thus Ak is defined by the rule:
ai,j =
-1 if the phrase is not in T*q
or
ai,j =
1 if the phrase is in T*q
Matrix Ak is used to
store the individual + - votes placed by each agent (i.e., the representational
element of the phrase representation of the passage.)
This linear model produces ties for first
place, and places a semi-order (having ties for places) on the categories by
counting discrete votes for and against the hypothesis that the document is in
that category.
A second data structure to record
weighted votes
A non-linear (weighted) model uses
internal and external weighting to reduce the probability of ties to near zero
and to account for structural relationships between themes.
Matrix Bk is defined:
bi,j =
ai,j * weight of the phrase in Tk
if the
phrase is not in T*q
or
bi,j =
ai,j * weight of the phrase in T*q
if the
phrase is in T*q
This difference between the two
multipliers is necessary and sufficient to break ties resulting from the linear
model (matrix Ak).
Data structure to record the
results
For each passage representation and each
category, the tally is made from the matrix Bk and stored in a
matrix C having the same number of records as the size of the document
collection, and having h columns – one column for each category.
The information in matrix C is
transformed into a matrix D having the same dimension as C. The elements of
each row in C are reordered by the tally values. To illustrate, suppose we have
only 4 categories and passage 1 tallies {-1214,-835,451,1242} for categories 1,
2, 3 and 4 respectively. So
cat1
à -1214, cat2 à
-835, cat3 à 451 and cat4 à
1242.
By holding these assignments
constant and ordering the elements by size of tally we have the permutation of
the ordering ( 1, 2, 3, 4) to the ordering (4, 2, 3, 1).
(
1, 2, 3, 4) à ( 4, 2, 3, 1).
This results show that for passage
1, the first place placement is category 4, the second place is category 2,
etc. The matrix D would then have (4, 2, 3, 1), as its first row.
(bridge) Comments
can be sent to ontologyStream e-forum . (back to SBF3)