Running the SDK
Running the SDK
To use the SDK, you will need to have an instance of Database
and of Categorizer
Categorizer
is used to categorize a domain
Database
is used to get metadata of URL or application categories
The best way to understand how the SDK works is to look at the example code in ;/src/main.go
Important
If you want to do categorization or detection using IP addresses instead of URLs, you will need to use Database.SetIpRangeMatcher
to improve performance.
Main functions
src/main.go
This file contains example code to illustrate how to use the SDK to categorize URLs.
You can run the binary directly via docker using ./scripts/client.sh
or manually using the toolchain of go
.
NB: if you use the binary directly, at least two arguments are required: the redis DNS with -rdb -dsn
and at least one domain to categorize
This code imports the package gitlab.olfeo.tech/data-tools/nexus/sdk/redis
inasmuch as rdb
from the SDK. It then creates the structure redisDatabase
, which implements the interface Database
. This interface is passed to the function categorize.NewCategorizer
which categorizes the domains.
It returns two pieces of information:
the category of the domain by calling
Categorizer.GetDomainCategory
AndDatabase.GetCategoryInfo
the associated application (if it exists) and its category by calling
Categorizer.GetDomainApplications
,Database.GetApplicationInfo
AndDatabase.GetApplicationCategoryInfo
sdk-sample$ cd src src$ go run main.go -rdb-dsn redis://localhost:6379 dropbox.com Using rdb categorizer from redis://localhost:6379: dropbox.com: (15181) Online Data Storage dropbox.com: application (27) Dropbox application category: (10005) Online Software providing cloud back-ups and data storage infrastructure
sdk/categorize
To instantiate an instance of Categorizer
, use the function NewCategorize
Since ./src/sdk/categorize/handler.go
with an instance of Database
// Categorizer correspond to the main interface that provide the categorization service type Categorizer interface { // GetDomainCategory returns the correct category id of the given domain or url GetDomainCategory(ctx context.Context, domain string, urlPath string) (uint32, error) // GetDomainApplications returns the application ids of the given domain, in case where the domain is an IP, there can be multiple applications GetDomainApplications(ctx context.Context, domain string) ([]uint32, error) // GetAdvancedDomainInfo performs the same algorithm as GetDomainInfo but stores intermediate results GetAdvancedDomainInfo(ctx context.Context, domain string, urlPath string) (*AdvancedDomainInfo, error) }
sdk/database
To instantiate an instance of Database
, use the function NewDatabase
Since ./src/sdk/database/redis/connect.go
with a context.Context
and one redis.UniversalClient
.
type Database interface { // GetApplicationInfo returns metadata about the give applicationId // // Querying for a non-existent applicationId returns a sdk.NotInDatabase error GetApplicationInfo(ctx context.Context, applicationId uint32) (*ApplicationInfo, error) // GetDomainApplicationInfo returns application info about the give domain // // Querying for an unknown domain returns a sdk.NotInDatabase error GetDomainApplicationInfo(ctx context.Context, domain string) (*DomainApplicationInfo, error) // GetApplicationCategoryInfo returns metadata about the give applicationCategoryId // // Querying for a non-existent applicationCategoryId returns a sdk.NotInDatabase error GetApplicationCategoryInfo(ctx context.Context, applicationCategoryId uint32) (*ApplicationCategoryInfo, error) // GetCategoryInfo returns metadata about the give categoryId // // Querying for a non-existent categoryId returns a sdk.NotInDatabase error GetCategoryInfo(ctx context.Context, categoryId uint32) (*CategoryInfo, error) // GetThemeInfo returns metadata about the give categoryId // // Querying for a non-existent themeId returns a sdk.NotInDatabase error GetThemeInfo(ctx context.Context, themeId uint32) (*ThemeInfo, error) // GetLogoData returns the byte sequence for the logo (as a 64x64 pixel PNG image) // // Querying for a non-existant logoId returns a sdk.NotInDatabase error GetLogoData(ctx context.Context, logoId uint32) ([]byte, error) // GetDomainInfo returns the domain info associated with a given domain // // Querying for an unknown domain returns a sdk.NotInDatabase error GetDomainInfo(ctx context.Context, domain string) (*DomainInfo, error) // GetCategoryInfoList return a map of info on all categories in the database GetCategoryInfoList(ctx context.Context) (map[uint32]*CategoryInfo, error) // GetThemeInfoList returns a map of info on all themes in the database GetThemeInfoList(ctx context.Context) (map[uint32]*ThemeInfo, error) // GetThemeCategoryIds returns a list of category ids for a given theme // // Querying for an unknown theme returns an empty list GetThemeCategoryIds(ctx context.Context, themeId uint32) ([]uint32, error) // GetCategoryIds returns a list of all the categories in the database GetCategoryIds(ctx context.Context) ([]uint32, error) // GetThemeIds returns a list of all the themes in the database GetThemeIds(ctx context.Context) ([]uint32, error) // GetIpApplicationIds returns a list of all the applications in the database GetIpApplicationIds(ctx context.Context, ip string) ([]uint32, error) // GetApplicationCategoryIds returns a list of all the application categories in the database GetApplicationCategoryIds(ctx context.Context) ([]uint32, error) // GetCategoryApplicationIds returns a list of all the applications in the given application category in the database GetCategoryApplicationIds(ctx context.Context, applicationCategoryId uint32) ([]uint32, error) // HealthCheck returns an error if the database is not working correctly. // // The returned error will wrap the actual backend error HealthCheck(ctx context.Context) error }
Detect applications by their IP
The SDK allows detection of applications by IP but its default behavior may not achieve high performances, since the default matching strategy is not specifically trageted towards this use case and looks for strict equality in the database rather than matching by range.
The method Database.SetIpRangeMatcher
can be used to modify this behavior.
AutoUpdateMatcher
, as the name suggests, will update the IP ranges automatically if there are any changes in the database.
ctx := context.Background() db, err := rdb.NewDatabase(ctx, redisClient) if err != nil { panic("initialize db rdb") } autoUpdateMatcher, err := matcher.NewAutoUpdateMatcher(ctx, db) if err != nil { panic("initialize iprange autoUpdateMatcher") } db.SetIpRangeMatcher(autoUpdateMatcher) // Then use `Database.GetIpApplicationIds` as usual.
Note: If being reactively up-to-date is not a requirement, it can be disabled by passing the option WithoutAutoUpdate
At matcher.NewAutoUpdateMatcher