Back ...
... ... ... ... ... ... ... ... ... ... Send comments into the ontology
stream. ... ... ... ... ... ... ... ... ... ... ... Forward
(:..:) ß different threads à (:….:)
Bead 4
January
1st, 2001
OntologyStream Fable Indexing Project
Bead 4
January 11th,
2001,
Fable
Arithmetic Link (in progress)
On the automated generation of natural language by machine
Dr. Paul S. Prueitt
Knowledge Scientist at Acappella Software
Public Research Note
January 11, 2001
Whereas automated
linguistic analysis of text is available from several mature sources; the
automated generation of literary text is not.
The two processes appear to be inverses for each other; however, no
existing system for automation of linguistic analysis is reversible.
Statistics does not
supply the proper methods needed to support high quality text analysis. Statistics is a generalization from the many
to the one, whereas natural language functions also in a one-to-many process
that is incompatible with the way in which statistics has formalized the many
to one relationship..
A theoretical framework
for both processes is available by using a finite state machine. However natural language has evolved to
match the full reality of knowledge creation and sharing processes. Due to this rich match, the natural
languages are natural, whereas the formal systems are quite artificial. The difference is something that my
colleagues and I are working to make publicly understandable given the current
excessive interest in artificial intelligence.
Since the topic has many aspects, I can only refer the reader to my
written work at: www.bcngrougp.org.
The problem of
disambiguation is perhaps the easiest to use to point out the sever limitations
of the traditional expert systems and AI approaches to machine management of
the context of textual data.
Vector quantification, on
the other hand can be used to show how the traditional approach has some
merit. Data compression algorithms have
been and can be used in a data-understanding framework, as my work on fractal
based understanding of mammography: http://www.bcngroup.org/admin/CIL/core/Index.htm
The Hidden Markov Model
(HMM) approach has been developed extensively and has many of the positive
qualities of vector quantification. HMM
also has all of the limitation of a statistical approach.
The Acappella Innovation
walks away from the traditional effort to represent human knowledge in the form
of rules. The Innovation demands that
one develop a universe of discourse by descriptively enumerating the topics
that defines categories corresponding to the aspects of the universe of
discourse. The topics are arranged in a
natural fashion, in a way that mirrors the expression of written language in
sections, subsections, paragraphs, sentences, phrases and words. The exact form of the topic / question
hierarchy is not so important as the expressive value of the tokens when viewed
by a human mind.
Many linguistic
expressions are sufficient in communicating almost exactly the same
knowledge. In the same way, topic
ontology reaches a degree of sufficiency without imposing a crisp set of
interpretations.
The questions are
attached to the topics in order to create an evocative condition and to thus
allow knowledge sharing in a semi-structured fashion. Again, this is very natural, and is accomplished very much in the
same way as natural language is used in social discourse.
The natural language
generation comes about, also very naturally, by working out the details of how
sentences can be formed within these linguistic forms. Phrases are composed within a model of how
the topics might be responded to during the use of the Acappella Generated
Ontology. This is called an assessment
in the speech and language assessment tool called WordWeaver (www.wordweaver.com). The set of all possible phrase compositions
has to be worked out in advance by someone who is trained to do this type of
work. It is time consuming, but when
completed it works very well – as can be seen from the WordWeaver product
line.
The capacity to generate
reasonable narrative allows for, in the microcosm, the evolution of a machine
to human discourse. This discourse will
be an illusion, of course. However, to
the degree that knowledge within communities can be vetted using ontology streaming,
then this illusion will serve the useful purpose of conveying real
knowledge.
I know of one group in Australia that is seriously working on natural language generation using finite state machines and state gesture response pairing. However, this work is not public and has support within an academic community.