Back ... ... ... ... ... ... ... ... ... ... ... Send comments to review committee. ... ... ... ... ... ... ... ... ... ... ... Forward
December 1st, 2000 note to Alex Zenkin
December 12th, 2000 note from Alex Zenkin linked via beads below denoted as bead1 @ [AZ]
Alex,
Attempting to generalize CCG technique from number theory to other domains exposes very valuable problems.
I have initiated a strategy to walk the path required by the analysis of CCG techniques. I think about
(1) astronomical data - since certain classes of astronomical data have very regular patterns,
(2) analysis of machine generated fractal data - since the identification of regular patterns here involve a problem that the invariance is not "from some point on" but rather is seen only by a type of proper folding and rescaling of the data.
(3) EEG data - since this data has great scientific curiosity
(4) textual data
I have, after the time that this document was written, placed 312 Aesop fables into a FoxPro database. I have written text parsers and have ability to create and edit inverted indexes. The next step is to scatter points onto a circle, with the points each representing a full fable. The scatter is supposed to eliminate all semantics though a random process. The gather is to move the points closer together or further apart based on a semantic interpretation.
The semantic interpretation can come algorithmically using simplified latent semantic indexing - or similar technique. Effectively the semantics is determined pair wise using correlation of word frequencies. The semantics could also be imposed using visual inspection of pairs of fables and making a judgment about closeness of difference. Scatter - gather methods often involve many many local evaluations before an emergent global topology is established.
At this point, the fundamental problem is exposed. Visualization into a regular grid as is done in the CCG application to number theory is not productive in other domains. No adjustments to the algorithm for producing CCG images can be found to overcome certain problems.
(1) the principle of super induction can not be applied to criteria that exists beyond a certain point in the ordering of the data. As a matter of point, the non-number theory data is not even increasing as a function of it's index.
(2) The generators for number theoretic theorems are (a) Peano axiom (b) additive property and (c) multiplicative property. The CCG grid accounts exactly, and only, for these generators. The generators for text semantics have yet to be determined.
(3) In discussions with Peter Kugler, I have come to understand the exposed problems a little better. The generators for text semantics might be discoverable on a situational basis if one is aware of the measurement problem.
Here is my conclusion. The specific algorithm developed for CCG visualization of number theory will not be useful in visualization of "natural' data sets, because of the generators of these data sets are not represented in the CCG grid. However, the visualization grid is not the core of the CCG paradigm. The core is the principle of super-induction.
I take the principle of super induction to be a principle that allows criterion to be established such that once these criterion are seen visually (or otherwise) then some other truth can be inducted.
Now, having come to this new understanding, I reaffirm my original naïve belief that the CCG paradigm can be generalized to data sets other than the number sequences from number theory.
Alex Zenkin continued to point the way in his references to Vladimir Lefebvre's iconic tokens. I also make a connection to a paper written by D. Pospelov on "oppositional scales". Using iconic representations of some nominated set of oppositional scales, one may move scattered points on a n-sphere into clusters. The clusters are then good candidates for categories of meaning.
I just have to get a bit more work on this to send a final report to Army Research Lab.
Having outlined further my own efforts to complete our ARL project on time, I again restate my feeling that Alex Zenkin is far more qualified to work on this exciting generalization than am I. In any case, my own work will be incomplete for some time into the future. So I would greatly prefer to forward a report from Alex to ARL rather than to submit my own original work instead.