Back ... ... ... ... ... ... ... ... ... ... ... Send comments to review committee. ... ... ... ... ... ... ... ... ... ... ... Forward
The bead game is underdevelopment, however part of the function of the game comes from clicking the forward and back links above and from game players sending in Remarks.
Indexing
Common Methodology for Content Evaluation
Statement of Purpose: This White Paper is about methodologies and technologies that can be usefully applied to content evaluation.
General Issues: In general terms, we find the following kinds of issues:
1) Representation of content
2) Interpretation of content evaluation
3) Routing and Retrieval
4) Inference under uncertainty
5) Inference under incomplete information
6) Visualization
7) Collaboration
8) Presentation
9) Scalability of transactions
10) Software integrity
A common methodology can be put into place to guide technology development that supports context evaluation. This methodology addresses each of the general issues, doing so in a fashion that is generalizable to other domains.
Arguments: There are several arguments for common methodology for context evaluation.
First argument: common methodology for context evaluation creates reusable benchmarking of technologies and algorithms.
Second argument: practical constraints impose certain classes of solutions. These solutions have much to do with industry wide practices in software development, with algorithm maturity, and with common cultural conceptual grounding. We collectively know how to do certain things and we have certain types of tools. The match between what we can do and what might be done more optimally is a critical one.
Third argument: the development of reusable analysis regarding content evaluation aligns our thinking with a community of practice. Language use develops so that concepts can be understood more quickly. Lessons learned and best practices are accumulated.
Fourth argument: clarity of methodology provides continuation and productivity within the groups who are charged with designing, coding and implementing systems.
Case study: A client has a specific interest in Intellectual Property (IP) mining. IP mining is a domain where each of these general issues comes into play.
This client's objective is to organize, index and perhaps annotate all public Intellectual Property documentation. Once this objective is meet, the client will process, retrieve and route information based on flexible analysis about evaluation in variable and often un-anticipated context.
As an example consider the following text:
A toothpaste tube supporting device for supporting a substantially flexible toothpaste tube having an open neck part through which toothpaste in the tube is ejected and an opposite end part, said toothpaste tube supporting device comprising a substantially plate like support member having a first part affixed to a supporting surface and a second part extending substantially perpendicularly from the first part and the supporting surface, said support member having a substantially semicircular groove formed in an edge thereof; a clamp member pivotally affixed to the second part of the support member for movement in the plane of said second part and having a substantially semicircular groove formed in an edge thereof and forming a circular groove with the groove formed in the support member when the clamp member is in closed position with the support member with the grooved edges of the members in juxtaposition for accommodating the neck part of a toothpaste tube; and a holding device for releasably maintaining the clamp member and the support member in closed substantially coplanar relation with each other and forming a substantially circular groove between them consisting of the semicircular grooves of the members for accommodating the neck part of a toothpaste tube secure between said members to enable the tube to be emptied.
Table 1: Sample IP text element
An optimal strategy for achieving this objective will have to involve the use of widely understood algorithms, annotation methodology, and software components.
One could conceive of using an Autonomy Enterprise system since this would provide the following solutions to each of the general issues:
1) Representation of content: Representation of content would be via a Bayesian model as adjusted using the principle that rare words carry high value (so called Shannon informational principle).
2) Interpretation of content evaluation: Interpretation of content evaluation comes directly from individual users making ad hoc associations between text elements, text element classes and annotations. These associations are encoded in associative neural networks.
3) Routing and Retrieval: Routing and retrieval is an integrated part of the Autonomy profile based representation of content. There are weaknesses here that can be strengthened using Latent Semantic Indexing (LSI) and LSI type link analysis.
4) Inference under uncertainty: Inference under uncertainty is often not handled well by Autonomy due to it representational algorithms. These algorithms often do not get at latent semantic linkages, and uncertainty further reduces the validity of machine inference (such as retrieval).
5) Inference under incomplete information Inference under incomplete information should be recognized using methodology that requires users to either deploy additional algorithms or make additional judgments (knowledge acquisition and annotation).
6) Visualization: Visualization methodologies are beginning to mature. These techniques are all characterized by the use of perceptual acuity to identify and promote patterns and classes of patterns into a knowledge framework. As example of this is the Imagination Engine of Steve Thaler and the RIB process of Inmentia.
7) Collaboration: Adequate collaboration capabilities exist within the Autonomy software suite. However the facilitation of collaboration needs reward structures, sometimes change management methodology and sometimes work flow restructuring.
8) Presentation: Presentation of collaborative and analytic tools can be acquired from Autonomy or Semio or one of these similar systems. These systems have large capitalization invested in making collaboration seamless. However the purpose of the collaboration is to produce organization, indexing and annotate to a large collection of semi-structured data. A browser technology will likely be designed and build based on XML streaming standards. This browser need not and should not be complex, and need not and should not be integrated with the collaborative suite. Humans need to shift back and forth between algorithmic systems. This opens up opportunities for personalization and ownership. A separation of work-product presentation and development presentation might make good practice.
9) Scalability of transactions: Scalability of transactions during the development process should not be the concern of the client. The client's concern is the development of work-product, not software. However end-product browser technology is a value added that can be designed in two phases: (1) early repository for work-product and (2) browser technology of end-use of existing work-product.
10) Software integrity: Going with Autonomy or similar enterprise system reduces the client's justified concern about software instability. Customized end-product browser technology has to be designed to be simple and consistent with industry standards like ASP (Application Server Pages) or JSP (Java Server Pages) as well as with XML.