Taxonomy
Note 2: Taxonomy Development
Thursday,
December 11, 2003
Drill down on types of Taxonomy
Purpose of Taxonomy: A fixed taxonomy is sought to drive long-term
document metadata production.
The Community is the Origin of Taxonomy: The stakeholder community knows the subject
matter. A comprehensive and commonly
agreed on collection of subject matter indicators can be acquired only
from the stakeholder community, and only by careful and complete
enumeration processes.
Upper Taxonomy: A two level Upper Taxonomy is required to be
stable because it will represent abstract classes of subject matter
indicators. The rigid control over
Upper Taxonomy works culturally because the elements of the Upper Taxonomy are
not specific to only single instances.
Hidden taxonomy: The Hidden Taxonomy is however often only
implicit, and often not seen, by humans, either when documents are placed into
repositories or when repositories are searched and documents retrieved. Full text indexing will reflect the
linguistic variation in the text. The
measurement of linguistic variation in text is one technical means to capture
implicitly a functional Hidden Taxonomy.
Reification:
The relationship between Hidden Taxonomy and the
Upper Taxonomy is reified by community participation, both in the creation of
the Upper Taxonomy and in guidance imposed on hidden processes by active user
community feedback. The term “reification”
means “making machine knowledge representation human like”. Human centered reification is efficient only
when some part of stakeholder community inspects and make refinements to the
elements in the Hidden Taxonomy.
Kinds of Taxonomy: A review of the literature on subject matter
taxonomy indicates that there are three kinds of metadata:
{ reuse metadata, retrieval metadata,
tracking metadata }.
Each of these kinds of metadata are developed
separately. One possibility for the
first level of the Upper Taxonomy is in fact these three categories of
metadata.
Value of a fixed Taxonomy: A fixed taxonomy is put into place to drive
document metadata production. Fixing
the taxonomy provides specific benefits to later retrieval, due to (1) standardization
of terminology and (2) anticipatory responses from users. In other words, the stakeholder community
develops anticipation about how subject matter is classified due to
standardization over a controlled vocabulary.
Limitations of a fixed Taxonomy: However, there is a trade off that reduces
work and communication efficiency when the controlled vocabulary flexibility
becomes rigid, or if the taxonomy is not reflective of true subject matter
content.
Evolution of Knowledge Flow models: If the fixed taxonomy has some tracking
metadata, then one is able to see the patterns of information flow that can be
sometimes, but not always, converted to classical pre-wired work flow
(completely controlled by explicit rules and procedures). In many cases, the patterns of information
flow can be captured into models.
Knowledge management specialists call these models knowledge flow
models.
If a fixed tracking metadata taxonomy is in use
for a short period of time, then the knowledge flow model will become revealing
and useful in realizing greater efficiencies in community communication and in
realizing greater transparency over those processes where transparency is
mandated by law.
Development of a dependable Retrieval Index: A fixed taxonomy has subject matter
indicators that point to concepts and processes. Members of a community of practice anticipate that these concepts
and processes are to be subject for future queries. The community will develop use practices that depend on this
specific and fixed metadata organization of information. The more comfortable the community feels
about subject matter taxonomy, the more skill will be developed by the
community in using this taxonomy.
The notion of a controlled vocabulary is
relevant here because the terminology being expressed in free text will change
over time and will have terminology elements that managed by a library staff or
information intermediators. The controlled
vocabulary is often much larger than the subject matter taxonomy. Subject matter taxonomy should have an
interface with the controlled vocabulary used in managing the functionality of
full text search and retrieval. We are
recommending that the notion of a Upper Taxonomy and Hidden Taxonomy be
used.
The FCC taxonomy architecture will have a fixed
"Upper Taxonomy" with two levels and a managed Hidden Taxonomy with
mediated interface over implicit elements used by Verity and/or Autonomy. The Hidden Taxonomy will have a 300 – 400
term controlled vocabulary, and constructions using this controlled vocabulary
will be mapped to elements in the second level of the Upper Taxonomy. This mapping can be put into place using the
existing Autonomy and Verify products (that are already purchased, by FCC, for
integration into iManage.)
Document type = reuse taxonomy. Materials available from the BearingPoint
information and content audit evidence the importance of document type. A standard taxonomy refined for the document
types are available from previous document management projects. These classifications are, taken as a whole,
an example of a reuse taxonomy.