Thursday, November 24, 2005
Center of Excellence Proposal
à
ONTAC stands for Ontology and Taxonomy Coordinating Working Group
It is a working group of
Semantic Interoperability Community of Practice (SICoP)
Communication from John Sowa
Jim started this thread by talking about numbers like N and N-squared, but they get into counting issues before it's clear what N refers to or what kind of math we should apply to it.
Nicolas raised an issue that gets closer to the heart of the matter:
NFR> ... it should be also very clear that the information
flow ontology for the communication between A and B is
fundamentally dependent on knowledge of what A and B
do with that information.
Indeed, there's communication, and there's "what A and B do". The first depends critically on some language, vocabulary, and speech acts. The second gets into "doing", which could mean several things: a logic-based inference, a procedural computation, a process that triggers physical events, or just storage for future reference. And both *speech acts* and *doing* imply some agent who has some purpose -- usually task oriented -- for saying or doing.
Of all these things, the vocabulary is the most obvious, and it's the one thing that tends to be the most stable, even if or perhaps even because the words may have more than one sense. WordNet makes provision for multiple senses, but many ontology projects concentrate on a fixed set of senses or types, each defined by precisely specified axioms.
In science, precision is good because it makes theories easier to test -- or, as Popper said, *falsify*. But a synonym for "easily falsified" is "fragile", as we have unfortunately learned with many of our computer systems. In communication, some vagueness is often good, because it makes a statement easier to verify, not falsify. As Peirce said, "It is easy to be certain, one has only to be sufficiently vague."
Jim raised another point:
JRS> Existing upper ontologies (SUMO, DOLCE, OpenCyc) aren't
being used much today, so why attempt to build a better one.
Why not try working with existing ones?
He proposed a test:
JRS> Are there any members of this forum willing to select an
existing upper ontology and try working with it? If so, I
suggest they conduct an evaluation, make a selection, and see
what tests and demonstrations can be run. If they show promise,
a case can be made for building a better upper ontology.
The biggest ontology ever developed has been tested over a period of 21 years: Cyc. It was originally supported by some large companies, each of which invested $500K per year and assigned their own personnel to test the ontology on some practical problems. Each company (which included several government agencies) had full access to the Cyc resources, including the Cyc developers for assistance and consulting. There have been some small applications, but no success stories with a positive Return On Investment.
One problem with Cyc is that the axioms are so precisely defined that it's fragile. Some people have recommended more formality, but that would make it even more fragile. There is no evidence that building a very large ontology, without a major change in the strategy for use and deployment, would be more successful.
I'm not against precision. We must have it for microprocessors or spacecraft that go to Saturn. But very precise calculations and inferences are always narrowly focused and highly specialized. Even though Cyc may be precise, it is still too general to solve most of the specialized problems that people need to solve.
We don't even have to look at spacecraft or microprocessors. Just look at the HALO study, which addressed the task of answering questions on a freshman chemistry exam. They tested three projects, including Cyc. Although Cyc started with the largest predefined ontology in the world, it had little or no advantage over the other two projects, both of which achieved a somewhat better score than Cyc.
Gary cited another of Doug Lenat's examples:
GBC> "If it’s raining, carry an umbrella." The following are
assumed in this summary rule...
Then he listed ten assumptions implicit in the rule, such as "the performer is sane" or "their actions permit them a free hand (e.g., not wheelbarrowing)". Gary (and Lenat) emphasize that the number of such variations is open ended.
These problems with Cyc arise in every branch of science, engineering, or business. The number of possible, but unlikely exceptions to any rule is so large that the probability that at least one of them will occur is very high. That's called Murphy's Law.
Cory discussed the "meta concepts common across architectural languages and notations" such as "UML, E-R models, OWL, Collaboration Modeling, Services Interfaces, Information Models, FEA-RMO, etc."
CC> The approach is to normalize and unify the concepts
expressed in these various languages into a controlled but
open set of concepts, this is the "semantic core". These
concepts may be introduced from any of the architectural
languages -- our job is to try and "slice and dice" the
concepts so that the fit together (where possible) and are
non-redundant (Where possible). We can then describe the
mapping and/or transformation of various tools and
representation into this common form.
This classification is very different from the ontologies of Cyc, SUMO, Dolce, or BFO. Instead of analyzing the content or subject matter, it addresses the metalevel and analyzes the kinds of tasks that are performed on that content. This is orthogonal to the classification of content, but it may be very important for the applications that use the ontologies.
Although I still believe that further research in ontology is important, I have little faith in the _Field of Dreams_ slogan: "If you build it, they will come." Cyc has been built, and the customers have not come. The major question is what strategies for designing and deploying ontologies might be more successful. Following are some points to consider:
1. Standardized vocabularies, terminologies, and nomenclatures were developed long before computers became available, and their value has been abundantly demonstrated, even without formal axioms associated with any of the terms.
2. Many such terminologies have logical errors that must be corrected. For example, three major links between terms must be clearly distinguished: type-subtype, type-instance, and whole-part. Some classifications lump all three under the heading broader-narrower, but that leads to serious confusion.
3. Other relationships should also be represented, such as locationOf, containerOf, attributeOf, and various relations of geography, kinship, and politics.
4. When two or more terms in the vocabulary have the same supertype, the differentiae that distinguish them should be explicitly stated, but very detailed axioms can often be more of a hindrance than a help.
5. More detailed axioms from science, engineering, law, philosophy, sociology, etc., are likely to be far too specialized, theory dependent, and not only unnecessary, but highly undesirable in a general-purpose ontology. For example, a general ontology should be neutral with respect to 3-D or 4-D models of space-time, situation calculus vs. pi calculus, or continuant-based vs. process-oriented ontologies.
6. The logic required for the general ontology should be very simple. Aristotle's syllogisms, which are a subset of description logics, are sufficient for the definitions discussed in points #2, #3, and #4 above. More complex logics should be limited to more specialized microtheories for particular applications, not for the general ontology.
This outline suggests a major reduction in the complexity of the logic and highly controversial issues about the nature of space-time, processes, objects, etc. Those issues may be extremely important for many purposes, but the fact that they are controversial means that they should be relegated to specialized microtheories, not the fundamental framework.
Meanwhile the kinds of metalevel discussions that Cory mentioned might be able to relate the ontology to software development processes in a way that Cyc never could. However, that is another controversial issue that should not be part of the fundamental framework.
John Sowa