Back ... ... ... ... ... ... ... ... ... ... ... Send comments into the ontology stream. ... ... ... ... ... ... ... ... ... ... ... Forward

(:..:)  ß different threads à  (:….:)

(*)(*)(*)(**)(*)

Bead 4

January 1st, 2001

OntologyStream Fable Indexing Project

The Fable Collection

Bead 4

January 11th, 2001,

Chantilly Virginia

 

Fable Arithmetic Link (in progress)

 

 

On the automated generation of natural language by machine

 

Dr. Paul S. Prueitt

Knowledge Scientist at Acappella Software

Public Research Note

January 11, 2001

 

Whereas automated linguistic analysis of text is available from several mature sources; the automated generation of literary text is not.  The two processes appear to be inverses for each other; however, no existing system for automation of linguistic analysis is reversible. 

 

Statistics does not supply the proper methods needed to support high quality text analysis.  Statistics is a generalization from the many to the one, whereas natural language functions also in a one-to-many process that is incompatible with the way in which statistics has formalized the many to one relationship.. 

 

A theoretical framework for both processes is available by using a finite state machine.  However natural language has evolved to match the full reality of knowledge creation and sharing processes.  Due to this rich match, the natural languages are natural, whereas the formal systems are quite artificial.  The difference is something that my colleagues and I are working to make publicly understandable given the current excessive interest in artificial intelligence.  Since the topic has many aspects, I can only refer the reader to my written work at: www.bcngrougp.org.

 

The problem of disambiguation is perhaps the easiest to use to point out the sever limitations of the traditional expert systems and AI approaches to machine management of the context of textual data. 

 

Vector quantification, on the other hand can be used to show how the traditional approach has some merit.  Data compression algorithms have been and can be used in a data-understanding framework, as my work on fractal based understanding of mammography: http://www.bcngroup.org/admin/CIL/core/Index.htm

 

The Hidden Markov Model (HMM) approach has been developed extensively and has many of the positive qualities of vector quantification.  HMM also has all of the limitation of a statistical approach.

 

The Acappella Innovation walks away from the traditional effort to represent human knowledge in the form of rules.  The Innovation demands that one develop a universe of discourse by descriptively enumerating the topics that defines categories corresponding to the aspects of the universe of discourse.  The topics are arranged in a natural fashion, in a way that mirrors the expression of written language in sections, subsections, paragraphs, sentences, phrases and words.  The exact form of the topic / question hierarchy is not so important as the expressive value of the tokens when viewed by a human mind. 

 

Many linguistic expressions are sufficient in communicating almost exactly the same knowledge.  In the same way, topic ontology reaches a degree of sufficiency without imposing a crisp set of interpretations. 

 

The questions are attached to the topics in order to create an evocative condition and to thus allow knowledge sharing in a semi-structured fashion.  Again, this is very natural, and is accomplished very much in the same way as natural language is used in social discourse. 

 

The natural language generation comes about, also very naturally, by working out the details of how sentences can be formed within these linguistic forms.  Phrases are composed within a model of how the topics might be responded to during the use of the Acappella Generated Ontology.  This is called an assessment in the speech and language assessment tool called WordWeaver (www.wordweaver.com).  The set of all possible phrase compositions has to be worked out in advance by someone who is trained to do this type of work.  It is time consuming, but when completed it works very well – as can be seen from the WordWeaver product line. 

 

The capacity to generate reasonable narrative allows for, in the microcosm, the evolution of a machine to human discourse.  This discourse will be an illusion, of course.  However, to the degree that knowledge within communities can be vetted using ontology streaming, then this illusion will serve the useful purpose of conveying real knowledge. 

 

I know of one group in Australia that is seriously working on natural language generation using finite state machines and state gesture response pairing.  However, this work is not public and has support within an academic community.