Two                                                                         InORB Technologies Inc                                                                       Home

 

BCNGroup e-Bulletin

 

 

Objective of the investment plans

Some examples

 

 

 


 

A First Tutorial on the InOrb Technology

Subject Matter Indicators

 

 

The scientist trains to develop a specific insight into things that are.  Without specific knowledge, the nature of this training is not simple to justify. 

 

My training has been in pure and applied mathematics, linguistics and machine algorithms.  Because of this training I was able to see something that once seen is quite simple to understand and to use.

 

The first publication of the concept, and the reduction to practice related to Subject Matter Indicator neighborhoods is published at:

 

http://www.ontologystream.com/beads/nationalDebate/sixteen.htm

 

We are looking to see what prior art there is, so that this invention can be placed into a proper historical context.  Likely there will not be a patent applied for.  Our interest is in having this knowledge not owned, ie public domain. 

 

It is December 13, 2003.  The core Founding Committee of knowledge scientists have many internal technologies that want to be expressed and that if expressed would be helpful to citizen centric government and individual control over one’s information space. 

 

Without any type of financial backing it is quite literally impossible the reveal these technologies in an orderly fashion, so we have developed an investment model based on our understanding of the BCNGroup Charter.

 

The InORB Technologies investment plan is at:

 

http://www.bcngroup.org/python3/ten.htm

 

This plan would spend $50,000 in 60 days to complete an ORB based product – described below, and then turn this product completely over to the Investor.  A related dataRenewal LLP investment plan is not published, but represented a more aggressive version of the InORB plan.  This more aggressive plan would spend $250,000 in nine months to establish the infrastructure allowing the use of ORBs to index (from the outside of the agencies) all public Federal and State document repositories.  We have started with the FCC since it was a Stop Work Order by the FCC of our now terminated 12-week contract that brought the founding group into economic crisis (it does not take a lot when one is at the bottom of the economic scale).

 

http://www.ontologystream.com/beads/nationalDebate/home.htm

 

These two investment plans are designed to template the formation, funding and enhance the long-term viability of many hundreds of Knowledge Technology companies, as consistent with the 1997 BCNGroup Charter.  We talk about this as igniting the knowledge technology sector.

 

Objective of the investment plans

 

The objective of the dataRenewal investment plan is to deploy an existing technology, described in this tutorial, to

 

Create an infrastructure by which anyone with an interest can develop subject indicator characterizations that find any subject matter within any public repository of documents.  Specifically the infrastructure will allow citizen center research into the subject matter discussed in FCC, FTC, SEC and other Federal public repositories.

 

This citizen centric enabling technology is to be free for anyone to use over the term of a nine-month development phase.  At the end of the nine months the Investor will sell the entire company to the highest bidder. 

 

InOrb, OntologyStream, and Instant Index, will provide a twenty-year license agreement to dataRenewal Inc.  However, the licensed technologies will not be encumbered in any way and allowed to compete with dataRenewal Inc or develop other parts of the knowledge technologies sector.

 

As of December 13, 2003 the search interface is web based.  A portal into four versions of the tool looks like Figure 1.

 

 

Figure 1: Dec 13 2003 software interface

http://www.datarenewal.com/services.html

 

Each of the four versions is in the early stage of development.  However, one can already see why the Subject Matter Indicator neighborhoods are exciting. 

 

Some examples

 

The dataRenewal Inc search provides a standard full text search on 650 Megs of FCC rulings (1997 – 2003).  These documents were autonomously harvester, from the FCC public web site, with an OSI software tool.  The Fables search is on a 151 K text collection of the 312 Aesop fables.   So we have the two extremes in terms of size.  This turns out to be an important differentiator between ORB technology and all other text-understanding technologies.  The ORBs can be developed using any size collection of documents.  Moreover, once interesting ORBs are found, by visual inspection, then the ORB structure provides information about co-occurrence of terms relative to specific subjects.  Due to the consistency of language use within a community, the information can be re-used to acquire subject specific information (form unrelated document collections) if the target document collection has a full text key word search engine. 

 

This means that if one uses ORB neighborhoods from the FCC ORB as a web (for example with www.google.com  ) key word search, one gets good retrieval results.  So the question is now how one develops good ORB neighborhoods that aid each individual in finding information that he or she wants to find.    The company InOrb Technologies will, when funded for $50,000, provide a killer application that lives in the Internet and can be used by anyone to produce ORBs.  The revenue model could be based on superDistribution – a technology invented by Dr. Brad Cox:

 

www.virtualschool.edu

 

and will be written and Perl and Python – so as to allow the Open Source community full access to the underlying technology. 

 

Currently we have some issues that will be cleared up as soon as the Instant Index Inc Software Development Kit (SDK) is completed (January 15th 2004).   But even now the current FCC ORB provides a reasonable means to acquire FCC ruling using a key word search.  The key word search can use the FCC search engine if one copies the words found in co-occurrence neighborhoods around key words (see the Figures), or one can use the search engine made available by dataRenewal Inc at www.dataRenewal.com

 

The Upper Fixed Taxonomy for the fables is completed, whereas the FCC Upper Taxonomy is not (that was what I was contracted to do over the 12-week contract).  By selecting the Advanced Fables search we are provided with the first example of a Subject Matter Indicator neighborhood (Figure 2) used as a search aid.  Not all of the Upper Taxonomy is being shown as we have not had time to do this work.

 

 

Figure 2: The advanced search interface to the fables

 

The blue graph, a ORB neighborhood, is produced using a number of methods that are part of the curriculum that we have developed on knowledge technology.  However, the exact means for the production is left to research notes.   The user is encouraged to go to the URL:

 

http://www.inorb.com/advanced/aggression.html

 

and select several of the subtopics under aggression.  These are: 

 

{ attack, battle, bow, prey , protection, ship, weapons }

 

Each of these subtopics has a Subject Matter Indicator neighborhood.  The neighborhood is a topological construction on a graph of all terms that are indicated by a controlled vocabulary.  In the case of the fables the size of the controlled vocabulary is 55.  In the FCC ORBs the size is still 16, 742 since we have not pruned away those that we would like to not have in a core Fixed Upper Taxonomy. 

 

This pruning was to occur using polling and survey instruments developed by OntologyStream under this terminated contract that bound OntologyStream and the FCC.  However, due to the FCC’s Stop Work Order, these instruments, while developed cannot be used within the FCC.

 

Part of the dataRenewal development path would use outside FCC expertise to develop a number of Fixed Upper Taxonomies for the FCC public rulings.  This effort will take the responsibility that the FCC legally has, but has failed to meet, and give average citizen’s control over high conceptual fidelity subject matter retrieval from the FCC repository.  Support for OntologyStream’s November 24th Waste Fraud and Abuse complaint is requested and can be made by giving a much needed donation to the BCNGroup fund for this purpose:

 

www.bcngroup.org

 

A copy of the complaint is available by request from portal@ontologystream.com

 

Clearly there is work still to be done, and this is why the core team of scientists is seeking a single investor to allow this work to be done in a reasonable context.  The current tutorial is simply trying to explain what we have done and what is left to be done. 

 

Figure 3:  FCC ORB structure

 

The ORB structure in Figure 3 provides to the user specific and exact information about the co-occurrence of words within the complete FCC public document repository.  The exact information is that for the word “voice-menu” there are exactly three other elements of a controlled vocabulary of 1500 FCC user community selected words that co-occur within two sentences.  These “significant words” are selected by the various communities and may vary depending on the user community needs.  Once an InORB Editor (a person) knows this controlled vocabulary, the Editor may run a conceptual role-up process as described in the public domain notational papers. 

 

A Trade Secrete technique is then used to derive a topological cover of a large ORB by a smaller ORB having a fractal relationship to the larger one.  Once this process is completed one is able to separate the information which is the domain and user community specific ORB and write this out as a very small (< 20K) ASCII file.  At that point all previous computer serves and linkage is severed.  The ORB serves as a two level broad-term / narrow-term subject matter taxonomy.  It can easily exist on a PDA.

 

In Figure 4 we show the results for the two word combination

 

{ voice-menu , inaccessibility }

 

There is only one file out of the 68,118 documents in the FCC public repository that has these two words occurring within a three-sentence window.

 

 

Figure 3: The only part of a document within the FFC 1997-2003 public ruling with a specific two-word co-occurrence

 

The power of the ORB subject matter indicators can be seen in the results demonstrated in Figure 3.  We will continue this demonstration so that each person can experience for him or her self the magic that is involved.

 

We return to the fable subject matter indicator show in Figure 2.  The fables consist of a collection of short stories each being about 200 words long.  There are 312 of these stories in a text-understanding research repository developed by Prueitt in 1996. 

 

(to be continued..  12/16/2003 9:54 AM)  Please excuse any typos or unclear statement and send comments to paul@ontologystream.com