DAML+OIL and RDF Schema representation of UNSPSC

This page contains references to both DAML+OIL and RDF Schema representations of the UNSPSC code.

about unspsc

UNSPSC is the Universal Standard Products and Services Classification. The UNSPSC Code is a coding system to classify both products and services for use throughout the global marketplace. The management and development of the UNSPSC Code is coordinated by ECCMA, the Electronic Commerce Code Management Association. The current version consist of more than 16.000 terms.

The public version of the current code set can be downloaded for free.

translation

There are different choices to make for the translation. A first choice is the identifier to use (i.e., the value of the rdf:ID attribute) in the ontology. To create a nice, browsable hierarchy of terms, it is attractive to use the UNSPSC title. However, these terms might change in the future. This also holds for the UNSPSC code itself. ECCMA provides the EGCI field as a unique and persistent identification for categories. Below we provide both a file with title as identifier and a file with EGCI identifiers.

A second choice is the modeling of the UNSPSC categories. A trivial solution is to represent all instances of different categories types (Segment, Family, Class and Commodity) as rdfs:Class-es. This will result in a plain RDFS file. A correcter solution is to model the UNSPSC category types as subclasses of rdfs:Class and make the actual terms in UNSPSC instances of such a subclass. This meta-schema of our UNSPSC representation (i.e., the definition of the used terms) is specified in a separate file.
A practical problem with this solution is that most tools do not parse meta-schema's and that they therefore do not recognize instances of subclasses of rdfs:Class as classes. To solve this, we made all definitions instances of rdfs:Class and added an explicit type statement to specify the UNSPSC category type. For example:

<rdfs:Class rdf:ID="Cats">
  <rdf:type rdf:resource="http://ontoview.org/schema/unspsc/1#Commodity" /> 
  <rdfs:subClassOf rdf:resource="#Livestock" /> 
  <unspsc:egci>000001</unspsc:egci> 
  <unspsc:code>10.10.15.01</unspsc:code> 
</rdfs:Class>

Note: the resulting files are strictly speaking not valid XML. The EGCI version uses numbers as identifiers. The XML standard states that names should start with a letter, underscore or colon. Similarly in the version with the titles as identifiers, those titles contain some forbidden punctuation characters. Thanks to Sean Bechhofer for pointing me to this.

The perl script that performs the translation is called convert.pl. The usage is as follows:

Converts a csv representation (i.e., a plain text file with the ';'
as field delimiter) of the UNSPSC codelist to an RDF Schema or
DAML+OIL ontology. It prints to standard output. Valid options are:
  -d  use DAML+OIL syntax instead of RDF Schema syntax
  -m  use the member version of UNSPSC Code; default the public
      version is used.
  -e  use the EGCI code as identifier; default the "UNSPSC title"
      is used as rdf:ID (only valid if combined with 'm')
  -h  print this help message and exit

result

The result of the translation can be found in the table below.

Title as identifier EGCI as identifier
RDF Schema unspsc84-title.rdfs unspsc84-egci.rdfs
DAML+OIL unspsc84-title.daml unspsc84-egci.daml

Note that those file are only meant for downloading. Do not link to them directly (or even worse: use the URL as a namespace) because their name and location will change!

experiences

The resulting files are quite big (around 4Mb!). It takes more than a minute to fully display the file in Internet Explorer. It is also possible to open the files with OilEd 2.2a. This took about 20 minutes on a PIII-450 computer. Surprisingly, browsing the ontology was still possible with a reasonable performance. A beta version of a future release of OilEd was much faster: it took only 3 minutes to load the file.

Protege parsed the file in 4 minutes and allowed smooth browsing.

© 2002 Michel Klein

$Id: index.html,v 1.8 2002/02/26 08:52:32 mcaklein Exp $