Skip to main content

Olfeo OEM documentation

Legacy file format

File format

Our legacy file format is LMDB.

Whatever language you are using, it should have a wrapper and/or library allowing to run queries against LMDB databases. See examples hereafter for some popular languages: Python , .NET, JAVA, PHP, C++ (a more comprehensive list here)

Responsibility disclaimer

This list is displayed for information purposes only. We do not endorse, validate any of the above and we decline any responsability of any kind regarding the use of these librairies. You should do your own due-diligences before using any of theses libraries.

Retrieving files

To retrieve the files, please get in touch with your account manager.

Categorizing a FQDN

It is indeed the expected behavior and the correct way to use the Olfeo database is to perform query from the deepest subdomain name to the top level domain, in sequence, until a match is foundmust. This allows the database to cover a large breath of subdomains that share the same category while allowing some specific subdomains to have other categories.

For instance, if you need to find the Olfeo category of the  spellcheck.gov.mn, you must first lookup the complete domain, spellcheck.gov.mn, and, if not matched, then check gov.mn.

It might be possible that a future version of our database provides more precise categories for subdomains of gov.mn, so an unsuccessful match today might change in a future version of the database.

Please note that, to be complete on this part, we provide a specific flag (called prefix_flag) with some domains. The presence of this flag indicates that for this domain, its own category do not apply to its subdomains. This allows the database to handle sites like blogspot that can have many subdomains with varying subject and categorisation.

So the algorithm should look like this:

Set is_parent to false
While not done
   Fetch the database for the given domain
   If an entry is found
      If the entry does not have the prefix_flag set or is_parent is false
         Return the category of the entry
      Otherwise
         Consider the domain unclassified
       End if
   Otherwise
      Set is_parent to true
      Remove the last label from the domain
      If no more label in the domain
        Consider the domain unclassified
End while