Tuesday, December 06, 2005
Center of Excellence Proposal
à
[bead thread on curriculum
reform]
The Taos Institute
(on the possibilities)
Communication from Paul Werbos
Good morning, folks!
I have been out for a few days, and have to catch up with overdue bureaucratic duties today. I have only had a chance to scan a fraction of the exchange of the last few days -- and clearly your comments on localization deserve more.
But... just a couple of thoughts.
Just as a matter of semantics, I would classify a true PDE model of physics as a "local" model. ( But "local" means many different things to many different people, in different contexts.) In my personal, heretical view -- I think Einstein was right, that the "best" opportunity before us today to unify all of laboratory physics lies in returning to the PDE formulation, with a new understanding of what it really offers us. That's an analog model, not a digital model. Dick Ballard's new posting suggests that we may be closer here than I thought. The word "best" summarizes many, many considerations -- like something that is doable, comprehensible, consistent with Occam's Razor, flexible, able to address phenomena and aspects not dealt with by other approaches, etc.
"Local" models of this kind can give rise to emergent effects that many people would regard as nonlocal effects or just plain weird stuff. (The respected leaders of quantum computing often describe their field as "harnessing weirdness" and the like.)
But -- even though this is the best opportunity before us TODAY, that does not prove that it MUST BE the ultimate truth. I sometimes think that leaders of theoretical physics today might benefit from refreshing their memory of past history, and of some basic principles of how science learn things, even ala Francis Bacon. Even as we organize our thinking, we need to create a kind of "open window" to be able to see quickly when nature starts to give us hints from beyond our present concepts. This is the reason why I challenged some leading mathematicians to develop some of the theory that might help us develop the ability to look for true NONLOCALITY in the basic laws of physics. It has never been seen, really, but perhaps we might see it if we were more able to do so. The scientific method demands that we be able to foster the development and refinement of ALTERNATIVE models, and not just the one we happen to be most excited by at the time.
But I do wonder how much consistency there may be between Richard's model and the PDE type of model I happen to be most excited by at this time. (For my version, search hep-th at arxiv.org for me as author.)
=============
My curiosity about this is piqued somewhat by his comments about the rate of convergence of various types of approximation series. Certainly, a lot of my thinking is essentially an outgrowth or development of the directions Von Neumann was pursuing. I have to admit that Dick's posting has stimulated me to notice some connections I had not noticed before.
Within the neural network field, work by Sontag and by Barron on function approximation theorems really has a very fundamental role. For practical neural network and engineering work, in addressing static mappings Y=f(X,W), I generally talk most about Barron's results. Barron showed that a few essentially nonlinear approximators -- like the Multilayer Perceptron (MLP) neural network -- work "better" than any linear basis function approximators, such as Taylor series, radial basis functions, mixtures of Gaussians, gain scheduling, lookup tables, etc. In this case, "better" means than the number of parameters needed to maintain a given level of accuracy in approximating a "smooth" function rises as a low power (1/3 root, I think) of the number of inputs to the function, whereas the number of parameters required rises exponentially for the linear basis function approximators. But Sontag -- in Jesuitical style -- has often stressed that "rational approximators" also do well. Some of Dick Ballard's comments do resonate with Sontag's observations about the difference in convergence rates of polynomials (a form of linear basis function approximator) and rational approximators - a ratio of polynomials.
Eduardo Sontag of Rutgers has done excellent original mathematical work in many areas -- but I never took this part very seriously, because it is not valid as a criticism of neural networks. The brain is not a single-input single-output (SISO) system, and one cannot design a brain by using a simple ratio of two polynomials. In any case, adapting such a thing would blow up the hardware fairly often, if it were used as an architecture for chips or neurons.
But... what of the matrix generalization? What of the further generalization where matrices are replaced by MLPs? (Narendra's early work on neural network control was motivated in part by seeing how natural this generalization is. He has talked a lot about that.) Dick's comments seem to address the first generalization.
In fact, if we make the double generalization... the double generalization of a rational function approximator is a class of neural network which I call the Simultaneous Recurrent Network (SRN). This is quite different from the "Simple Recurrent Network" coined later by other folks for reasons not appropriate to discuss here. Because the experience of the brain is not always "smooth" (in the more or less Lipschitz sense assumed in Barron's Theorems), I have argued that we need to use SRNs instead of MLPs for static function approximation, when we are trying to design a maximally general, maximally powerful brain. At arxiv.org, in the adap-org series within nlin-sys, Pang and I have a paper which gives a simple example of the huge practical importance of this generalization. It is essential to areas like brain-like video segmentation, autonomous navigation, strategic games and so on. The limits of MLPs are not the limits of neural networks; one can do better. (By the way, associative memory types of neural network like ART are not even as powerful as MLPs for such tasks -- but that is a story for another time.)
In summary -- the observations Dick has alluded to do mesh with some critical observations made in other contexts, though the nonlinear case does depend on some additional effects. (Also, for the record, the SRN by itself is still only a steppingstone. There is also the cellular SRN and the ObjectNet, described in my tutorial at www.eas.asu.edu/~nsfadp).
Likewise, similar mathematical effects are also crucial in time-series modeling. (And this is an area where I do cite Sontag's work still, when I get time to explain that far.) The ordinary series really are like the autoregressive or FIR models, or time-delay neural networks; the fractions are like the true ARMA models or IIR or TLRNs. TLRNs -- also discussed in my tutorial -- turn out to be absolutely essential to the most powerful applications of neural networks out there today to anything involving dynamical systems.
------------
But: what is the relation to physics?
Actually, it is quite strong.
A key point I have stressed down through the years, over and over again, is that time-forwards Markhov Process intuition is what screwed up the USE of Einsteinian "classical field theory" (PDE). If one analyzes the predictions of PDE models in a mathematically more correct manner, by thinking in terms of local Markhov Random Fields across space-time, the paradoxes all disappear. I published that many years ago, and the history of various rediscoverers is discussed in papers you can easily see at arxiv.org. But -- there is then the big challenge of moving ahead, and showing one can actually USE this kind of model to come to terms directly with the range of empirical phenomena we have seen, in a unified manner. That's what the new hep-th paper is about. But -- it would require a real expert in real analysis (ala Walter Strauss) to do the next step as outlined there -- to show that this approach immediately leads to the first truly rigorous, axiomatic formulation of quantum field theory. I would -- if life were not placing many rather intense competing demands on me lately, from the absurd and mundane to the global life-or-death.
I do not know whether Dick has backed up into some of the same mathematics from a different starting point. Such things do happen at times. But I do know there is no escaping the inherent complexity even of the portions I have addressed, and those portions are a crucial part of the story.
==============
Best,
Paul W.