Saturday, September 11, 2004
Background material on why a National Project is required
Discussion on ARDA proposal due October 13th
http://nrrc.mitre.org/arda_explorprog2005_cfp.pdf
Last edited:
9/14/2004 7:43 PM
BCNGroup Inc and Ontologystream Inc, who we are.
Advanced Research and Development Activity 2005 Challenge Workshop Request for Proposal has the following areas that are of interest to us:
A) Novel Intelligence from Massive Data (NIMD)
Proposal to be
developed by OntologyStream Inc
Our Contributions to a future NIMD Glass Box
B) Advanced Capabilities for Intelligence Analysis (ACIA)
Sense making (proposal to be developed by MITi Inc)
C) Information Assurance (IA)
(proposal to be develop by Director of IA, East Michigan University)
(see sub-thread on anticipatory HIP applications to Information Assurance)
supporting
Anticipatory Human-centric Information Production (HIP) Challenge Problem
The Behavioral Computational Neuroscience Group (BCNGroup Inc) is a not for profit corporation serving an informal science community. After initial discussion with ARDA point of contact and members of our community, we are working towards making three separate proposals for ARDA Community Building Workshops. Each of these will have a budget of $500 K. The most important of the three community building workshops is the one proposed by OntologyStream Inc, in support of the NIMD Glass Box. In this Workshop, BCNGroup scientists will over see the integration of technologies extending the Readware Provenance ™ software. Provenance ™ will be released before the Community Building workshop begins. User interfaces will be developed that assist the non-computer scientist in understanding the functions of Provenance ™ software as described in soft science terminology. The concept of a “Knowledge Sharing Foundation” will be prototyped as a means to bring new anticipatory HIP technology into service. Included in the activities of the Foundation will be the other two community building workshops. The Foundation will provide interoperable computer science revealing as much of the anticipatory technology as possible. Educational services will be provided via distance learning mechanisms accredited by more than one university. Community scholarship will be published for peer review in principled discussions expressed in electric format. The Foundation supports dialog on a range of soft and hard science topics related to anticipatory technology and methodology.
We have some understanding of the history of the Glass Box, the original reasons why government clients conceived the Glass Box, and what has been developed so far. Our group, in a SAIC/Ontologystream proposal in 2002, was deemed fundable, for 3.2 M, in the first round of NIMD funding. Since 2002 we have had several contacts with Glass Box clients and program managers.
We have continued to work over the past year and a half. Our small group is now in a position to deploy a new type of system that relies on human cognitive acuity and becomes anticipatory in ways similar to how humans behave in familiar environments.
The system that we are ready to deploy is developed consistent with a paradigm, called HIP (Human-centric Information Processing). An early form of HIP was described in that 2002 proposal. What we have now is simpler, more transparent to the user, and has a deeper grounding in natural science.
For us, as for others who think in a similar way, the past two decades has not been flush with funding. Since 9-11-2001, the funding prospects have been even more limited. We are, nevertheless, intent on forming a new interdisciplinary community by bring together technologies that exhibit anticipatory capability and natural science that explains why the technology works.
Our work has been focused squarely on an important future paradigm in information science. This work is occurring in several countries and involves at least 20 major contribution lines. Our work on an “anticipatory technology” references several hundred specific individual contributing lines of research.
Related lines of research include associations with the
Human Mark-up Language standard,
Topic Maps standard,
abstract intelligent agents,
data structures,
topological logics,
ecological psychology,
evolutionary psychology,
general systems theory,
complexity theory,
graphical representation of human concepts,
natural language parsing,
biologically feasible mathematical models of human
brain function,
social network measurement and modeling, and
models of deep structure in natural language expressions.
Members of a board of science advisors represent these lines of research. Advisors follow our deployment and testing activity and produce objective evaluation of the scientific merit of our approach.
Since 1992, the informal association of scientists has been discussing a National Manhattan-type Project, to create academic centers in the knowledge sciences. The planning for such a National Project is still very underdeveloped. However, the ARDA Workshop will facilitate additional collaboration while focused on the Anticipatory Human-centric Information Production Challenge Problem.
A number of the advisors are expected to make a presentation at one of the conferences being planned for the academic year 2004-05. The presentations will also serve as the first curriculum outline for academic knowledge science.
These presentations will make specific recommendations to IC sponsors on technology development based on anticipatory structures.
We can demonstrate almost all of the capability we proposed in 2002. A completion of this planned work can be achieved on a budget of $500,000, half of which targets community building. This budget includes $260,000 for salaries and software development at Ontologystream Inc.
We have established several collaborative environments using Grove software collaborative tools. What we need is access to data and some means to expose completed HIP systems in operational IC settings. We need a place to demonstrate the software, the data encoding and the HIP use philosophy.
For Proposal for participation in the ARDA Workshops services the following objectives:
1) Community building around the concept of academic knowledge science
2) Tight binding between academic knowledge science and ARDA’s three regional research centers, Northeast Regional Research Center (The Mitre Corporation, Bedford MA.), Northwest Regional Research Center (Pacific Northwest National Laboratory, Richland, WA), and Southeast Regional Research Center (Georgia Tech Research Institute, Atlanta, GA and Oak Ridge National Laboratory, Oak Ridge, TN).
3) Creation of dual use anticipatory Human-centric Information Production systems.
The requested funding level allows some leeway as to classification of these proposals as “seedling” proposals or full Workshops.
Our group recently received capitalization based on an issue of private stock in Ontologystream Inc. The new capital is supporting our integration of Orb data encoding with the MITi Inc Readware substructural ontology into a commercial product called Readware Provenance ™. Anticipated release date is October 15th, in time for some use as a polling instrument development aid.
Provenance ™ provides pollsters with real time thematic analysis of the treads in social discourse. As such, Provenance ™ showcases the capabilities that we advised NIMD we were prepared to develop and deploy in 2002.
Provenance ™ capabilities are focused at event detection. Consistent with the HIP paradigm, a limited use of deductive computation is made. We use parsing filters to find specific structure. We make an encoding of the substructure invariance, instrumented with the existing Readware framework, into the Ontology referential bases (Orbs).
The HIP paradigm makes a fundamental shift in assumption about the possible outcomes from purely deductive systems. Informational structures are developed that prime cognitive activity by humans. Explicitly we hold the position that natural science indicates that humans experience “knowledge”, and that this experience cannot be reproduced in a Turing machine. What we can do is to create a dependency between computer-encoded information and what the observing human experiences. This dependency supports a mutual “induction” of human experience of knowledge, in much the same way as the text in a book “causes” human mental experiences. How the HIP paradigm is explained has been a key challenge to scientists who have been working on the preliminary scholarship. The ARDA workshop(s) could provide the breakthrough that we have been looking for, resulting in high quality scientific discourse and in operational HIP systems that demonstrate what is possible when the computer is used to measure structure from “raw” data, and represent this structure to human(s) working within a “knowledge operating system” (KOS). The KOS was first described in our 2002 NIMD proposal.
We support human-centric annotation into multiple theories of meaning and context. These theories each have a common notational language and this notational language has a consistent and operational data encoding standard and standard means to produce data mining and data organizational processes. Our work is grounded in a theory about human anticipation and a physical theory about how natural processes are organized into organizational layers. Stratified theory is then used to discuss observations about the relationship between physical atoms and physical chemistry. The Conjecture on Stratification then abstracts this discussion and conjectures that complex phenomenon, such as the behavior of the world wide Al-Qaeda movement or the complete “system” supporting human language use, has either a periodic table with invariances corresponding to atoms or a set of potential meta-stable substructural ontologies which role in producing observed phenomenon can be made the object of objective investigations.
Our approach anticipates the current political discourse about the non-interoperability of information technology, by using a simple and universal data encoding of the most basic element of a cognitive graph, in the form < a, r, b > where a and b are nodes and r is a relational operator. A set of these basic elements is called an Orb set. Raw data is thus encoded as facts without organizing these facts in any way. The facts are in the form < a, r, b > or < r, a(1), a(2), . . . , a() > and often do not have contextual information, since the theory tells us that a measurement of substructure should be made separately in order to acquire “localized” facts.
Co-occurrence, or a more general form of relatedness, pairwise relates individual substructural elements. The identification of compounds is derived from the Orb set, via convolution. Compounds are often expressed as n-aries in the form < r, a(1), a(2) , . . . , a(n) >.
We use the term “stratification” to separate the structure of compounds, expressed as sets of elements from a specific substructural ontology. The assignment of meaning is made through a “principled” study of the function that similar compounds have. This assignment task is called qualitative structure function analysis (Q-SAR). Several groups use Q-SAR methods in biochemistry and in situational logics as applied to control tasks. Our group has specific expertise in this area. Q-SAR can be performed using any one of several methods.
Human judgment is often better than formal Q-SAR methods. Our SLIP (Shallow Link analysis Iterated scatter gather and Parcelation) visualization software (Prueitt, 2001) brings structural patterns into an emersion environment where human and community judgments can be rapidly exercised. The Orb encoding optimizes convolution and thus massive data can be parsed in real time by specific filters, encoded into Orb sets, and subsets of the encoding visualized almost in real time. The resulting subsets are patterns of related structure whose meaning can then be judged, or automatically placed into pre-existing categories using one of several methods. The annotated patterns then become the basic for machine “guesses” as to future functions of patterns.
Data is written in either simple hash tables or in simple ASCII files and can be expressed as binary-XML. Global “convolution operators” are then used in real time to almost instantly provide an organization where context and nuance can be added via human cognitive priming and human use of tacit knowledge. Distributed sharing of these resources use encryption to maintain security.
In many instances the semantic relationship between and “a” and “b” is uncertain of not properly labeled. Thus us most cases the facts are not semantic, i.e., not about meaning. The facts are measured observations about structural co-occurrence. In real time measure of data the structural co-occurrence is presented without fixed interpretation. A secondary process can provide additional recourse such as plausible reasoning using situational logics, polylogics and schema logic.
The provenance compression produces a thematic analysis of real time social discourse from the machine processing of massive amounts of real text from real discussions. Our anticipatory technology, based on a separation of computer encoded memory of invariance from the templated anticipatory mechanisms, creates and stores periodic tables related to how the measured invariance and patterns of invariance fits into event formation. The production of these tables follows the best traditions from the natural science, using both observation and theory to advance a body of research.
Formal situational, and formative, logics based on Mill’s and Peircean formalisms leads to qualitative structure activity relationship analysis. The consequence of our methodology produces compact artifacts called substructural ontology, and ultrastructure templates (similar to the work that Jeff Long used in knowledge encoding for DOE). The formalism supports plausible inference and mutual induction involving a crisp separation between the tasks assigned to computer algorithms and the task assigned to human cognitive behavior. Very little dependency is made on deductive inferencing, which we regard as being too weak to reliably recognize novelty or to make reification judgments.
We are integrating existing software methods from several lines of research, without high licensing agreements. We can do this either because we have understood the methods and generalized them so as to avoid patent claims, or we are cooperating with patent holders in an effort to bring a new type of information science into the marketplace.
We strongly feel that our group has contributions to make to the design of software that meets the original functional objectives of NIMD.
At the heart of the NIMD Glass Box architecture, we present measured structural invariance from real time raw data. The set of invariances are then organized in a fashion that is similar to several instances in the research literature where “periodic tables” or frameworks are created for measuring the function of events where incomplete and uncertain information is available. Specifically we refer to the Readware letter semantics and the work done in the former Soviet Union by Victor Finn and Dmitri Pospelov.
In general terms we refer to the measured invariance as categorical Abstraction (cA) and the aggregation patterns for significant event structures as event Chemistry (eC). Categorical Abstraction is measured using latent semantic technology techniques like algebraic LSI (owned by SAIC) or probabilistic LSI (owned by Recommind Inc), or generalized LSI (gLSI) discovered by Prueitt (2003). A formal notation has been developed, by Prueitt, for mapping continuum mathematical models to discrete formalism expressed as Orb sets, i.e., in the form { < a, r, b > }. This formalism is called Differential Ontology Framework (DOF). Using DOF one automates the production of Orb sets from the classical latent technologies. Other means for the development of Orb sets also exists, and yet once Orb sets are produced they can be easily added and subtracted to produce integrations of information.
A formal theory modeled after the mathematical convolution has been developed to aggregate subsets from Orb sets. The subsets of Orb sets are Orb sets, so the class of convolution operators can be studied for formal properties. The aggregated results are presented to humans in such a fashion that the role of human tacit knowledge would not be second-guessed by a hard encoding of the paradigm of hard cognitive engineering.
The study of human analysts’ analytic and cognitive behavior would be more useful if computer technology is based on Orbs and anticipatory design principles.
We agree that a soft cognitive engineering is vital to the future of Intelligence Community knowledge management, but we are uncomfortable with the type of hard cognitive engineering being funded by government agencies.
Many analysts feel that the current implementation of a hard cognitive engineering regime is imposed on the analyst. We suggest that a poll of analysts would verify this impression. Software design and classical theory about information and control may be the central problem. The problem is not unique to the NIMD Glass Box program.
For example, premature closure, in analytic decision-making processes may be due to a type of learned helplessness on the part of analysts who are frustrated with a software system that is managing the analyst rather than acquiring and presenting the relevant data in real time. The study of premature closure in an environment where frustration due to the design of the software does little to advance that state of the art, unless these results are used to refactor the software design and make the design HIP and anticipatory. Our community is prepared to assist in this refactoring process.
Soft cognitive engineering shifts the origin of control towards the user in real time, in real situations, while encoding human behavior for use in future automation processes. We would use the Human Markup Language standard to mark up human behaviors related to analytic and cognitive activity.
What exactly is the current state of the Glass Box? Is there an objective AS-IS model of the data structures, computational processes, and analytic interface to analysts? Would our team of scientists be allowed to make our own independent evaluation of the design, implementation and function of the Glass Box in an open source setting?
In the ARDA RFP, it is stated:
NIMA aims to preempt strategic surprise by addressing root causes of analytic errors related to bias, assumptions, and premature attachment to single hypothesis. The program may also assist with capture and reuse of analytic best practices.
At the heart of NIMD is a piece of software called the Glass Box that resides on an analyst’s workstation and captures the parts of the analytic process that happens online. … NIMD research is developing techniques and tools that infer the state of analysis and the analytic process from Glass Box data, assist analysts in making explicit their analytic (cognitive) state, and uses the captured knowledge and analytic models to drive automated organization and exploration of massive data. (for the complete statement see the ARDA RFP)
Or position is that this aim is misplaced because the acquisition of data in a useable format is poorly developed, largely because
1) the existing system most often depends on relational database encoded data,
2) the source of data is often removed from original sources and is itself often derived from reports generated within a stovepipe organizational framework
3) the use of ontology services is restricted to Cyc Corp type ontology services where first order predicate logics act on logical atoms in a way that creates false sense-making and fidelity issues unpredictably
4) in spite claims, no computer program has demonstrated human cognitive ability
Our approach is called HIP (Human-centric Information Production) and would have at the heart of NIMD architecture humans who have informed use of anticipatory services equipped with a category theory grounded encoding of data invariances and patterns of invariances. The human tacit experience would drive the acquisition of a repository of patterns, while leaving the interpretation of the meaning of each of these to human cognitive capabilities.
The Glass Box would become a transparent view into data encoded into the Orb constructions. The Glass Box would be an instrument that aids humans in seeing in real time the structure of data, even when that data is massive and not organized in the raw for observation.