[1]                               home                            [3]

ORB Visualization

(soon)

 

The Web Harvest for the FCC web site (years 1997 – 2003)

 

Text files from the years 1997 thru to 2003 which exist in the FCC e-Docs database, have all been harvested for study within the context of the taxonomies project currently underway.

 

Only those files that were of a text format were harvested (i.e. no pdf, html, WordPerfect). This should still account for a large percentage of all documents available, as there is almost always a text file created from the other formats they use, as an alternative format for anyone to view (i.e. if there was a wordperfect document, it was almost always converted to a text file as well). Those files, which were converted from another format, have a note contained within asterisks containing the information related to what they were converted from, as well as a disclaimer saying that certain formatting is lost during conversion.

 

The files were harvested using the FCC e-Docs yearly index files;

http://infoserver.fcc.gov/Document_Indexes/1997_annual_index.html

http://infoserver.fcc.gov/Document_Indexes/1998_annual_index.html

http://infoserver.fcc.gov/Document_Indexes/1999_annual_index.html

http://infoserver.fcc.gov/Document_Indexes/2000_annual_index.html

http://infoserver.fcc.gov/Document_Indexes/2001_annual_index.html

http://infoserver.fcc.gov/Document_Indexes/2002_annual_index.html

http://infoserver.fcc.gov/Document_Indexes/2003_annual_index.html

 

A program called "Memoweb 3" was used to allow these index pages to be read, and the text files linked by them to be downloaded.

 

The text files for the various years have been compressed into ZIP files and put into the following locations:

http://bcngroup.homelinux.org/fcctext/FCC1997.zip

http://bcngroup.homelinux.org/fcctext/FCC1998.zip

http://bcngroup.homelinux.org/fcctext/FCC1999.zip

http://bcngroup.homelinux.org/fcctext/FCC2000.zip

http://bcngroup.homelinux.org/fcctext/FCC2001.zip

http://bcngroup.homelinux.org/fcctext/FCC2002.zip

http://bcngroup.homelinux.org/fcctext/FCC2003.zip

 

Further, an index file was retrieved from the FCC public FTP server. This can be used in order to determine a categorization, title, and release date of any file, given it's filename. This should provide for a quick and easy way to categorize any files which the end users need to. This method is also more accurate, one would suggest, than the general overview of the naming conventions which are listed below.

 

The index file has also been archived to;

http://bcngroup.homelinux.org/fcctext/FCCindex.zip

 

Although a certain level of naming conventions exists, it becomes less and less pervasive as you go on through the years. As a general indication, however, the following names of files have their corresponding meaning;

 

http://www.ontologystream.com/beads/nationalDebate/two-one.htm

 

Please feel free to e-mail me with any questions or comments about the harvested text set, the method of harvesting, and anything else you see fit to ask me about.

 

Yours truly,

 

   Nathan Einwechter, OntologyStream Inc, Nov 2003