This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data Receipt

Describes the varisous aspects of Stroom (& Stroom Proxy) receiving data via its /datafeed or event endpoints.

1 - Feed Status Checking

The process of checking a Feed’s status on data receipt to determine what to do with that data.

Feed status checking is Stroom’s legacy method for controlling data receipt. For a richer method of controlling data receipt, see Data Receipt Rules

If the property stroom.receive.receiptCheckMode is set to FEED_STATUS, the Feed Status value that has been set on the Feed is used to determine the action to perform on that data.

Feed Status Values

A Feed can have the following Feed Status values:

  • Receive - All data for this Feed will be received into Stroom / Stroom Proxy.

  • Reject - All data for this feed will be rejected. The client will get HTTP 406 error with the message 110 - Feed is not set to receive data.

  • Drop - All data for this Feed will be silently dropped by Stroom / Stroom Proxy, i.e. discarded and not stored. The client will receive a HTTP 200 response as if the data had been successfully received. This is for use if you do not want the client to know their data is being discarded.

Stroom Proxy

Stroom Proxy is also able to perform Feed status checking. Stroom Proxy does not have direct access to the Feed settings so has to perform the Feed status check by making a request to a downstream Stroom Proxy or Stroom. If a Stroom Proxy receives a Feed status check it will proxy that request to its own downstream Stroom / Stroom Proxy.

Stroom Proxy will cache the response it gets from the downstream, so that it doesn’t need to make a call for every stream received.

To configure Stroom Proxy for Feed status checking you need to set the following properties:

proxyConfig:

  receive:
    # The action to take if there is a problem with the data receipt rules, e.g.
    # Stroom Proxy has been unable to contact Stroom to fetch the rules.
    fallbackReceiveAction: "RECEIVE"
    receiptCheckMode: "FEED_STATUS"

  downstreamHost:
    # The API key to use for authentication (unless OpenID Connect is being used)
    apiKey: null
    # The hostname of the downstream
    hostname: null
    # The port to connect to the downstream on
    # If not set, will default to 80/443 depending on scheme.
    port: null
    # The scheme to connect to the downstream on
    scheme: "https"

2 - Data Receipt Rules

Describes the process of creating Data Receipt Rules to control whether data received by Stroom or Stroom Proxy is Accepted, Rejected or Dropped.

Data Receipt Rules serves as an alternative to the legacy Feed status checking performed by Stroom Proxy and Stroom. It provides a much richer mechanism for controlling which received data streams are Received, Rejected or Dropped. It allows anyone with the Manage Data Receipt Rules Application Permission to create one or more rules to controls the receipt of data.

Data Receipt Rules can be accessed as follows:

Administration
Data Receipt Rules

Each rule is defined by a boolean expression (as used in Dashboards and Stream filtering) and the Action (Receive, Reject, Drop_ that will be performed if the data matches the rule. Rules are evaluated in ascending order by Rule Number. The action is taken from the first rule to match.

If no rules match then the data will be rejected by default, i.e. the rules are include rather than exclude filters. If you want data to be received if no rules match then you can create a rule at the end of the list with an Action of Receive and no expression terms.

If a stream matches a rule that has an Accept action, it will still be subject to a check to see if the Feed actually exists. This means that the rules do not need to contain an Accept rule to cover all of the Feeds in the system. They only need to cover The client will receive a 101 Feed is not defined error if it does not exist.

images/user-guide/data-receipt/ReceiptRules.png

The screen operates in a similar way to Data Retention Rules in that rules can be moved up/down to change their importance, or enabled/disabled.

Fields

The fields available to use in the expression terms can be defined in the Fields tab. The terms will be evaluated against the stream’s meta data, i.e. a combination of the HTTP headers sent by the client and any that have been populated by Stroom Proxy or Stroom. This allows for the use of custom headers to aid in the filtering of data into Stroom.

images/user-guide/data-receipt/ReceiptRuleFields.png

Dictionaries are supported for use with the in dictionary condition. The contents of the dictionary and any of the dictionaries that it inherits will be included in the data fetched by Stroom Proxy.

Stroom Configuration

Data Receipt Rules are controlled by the following configuration:

appConfig:
  receiptPolicy:
    # List of fields whose values will be obfuscated when the rules
    # are fetched by Stroom Proxy
    obfuscatedFields:
    - "AccountId"
    - "AccountName"
    - "Component"
      # ... truncated
    - "UploadUserId"
    - "UploadUsername"
    - "X-Forwarded-For"
    # The hash algorithm used to hash obfuscated values, one of:
    # * SHA3_256
    # * SHA2_256
    # * BCRYPT
    # * ARGON_2
    # * SHA2_512
    obfuscationHashAlgorithm: "SHA2_512"
    # The initial list of fields to bootstrap a Stroom environment.
    # Changing this has no effect one an environment has been started up.
    receiptRulesInitialFields:
      AccountId: "Text"
      Component: "Text"
      Compression: "Text"
      content-length: "Text"
      # ... truncated
      Type: "Text"
      UploadUsername: "Text"
      UploadUserId: "Text"
      user-agent: "Text"
      X-Forwarded-For: "Text"
  receive:
    # The action to take if there is a problem with the data receipt rules, e.g.
    # Stroom Proxy has been unable to contact Stroom to fetch the rules.
    fallbackReceiveAction: "RECEIVE"
    # The data receipt checking mode, one of:
    # * FEED_STATUS - Use the legacy Feed Status Check method
    # * RECEIPT_POLICY - Use the new Data Receipt Rules
    # * RECEIVE_ALL - Receive ALL data with no checks
    # * DROP_ALL - Drop ALL data with no checks
    # * REJECT_ALL - Reject ALL data with no checks
    receiptCheckMode: "RECEIPT_POLICY"

Stroom Proxy Configuration

appConfig:
  receiptPolicy:
    # Only set this if you need to supply a non-standard full url
    # By default Proxy will use the known path for the Data Receipt Rules resource
    # combined with the host/port/scheme from the `downstreamHost` config property.
    receiveDataRulesUrl: null
    # The frequency that the rules will be fetched from the downstream Stroom instance.
    syncFrequency: "PT1M"

  # Identical configuration to Stroom as described above.
  # Stroom and Stroom Proxy can use different `receiptCheckMode` values, but typically
  # they will be the same.
  receiptPolicy:

Stroom Proxy Rule Synchronisation

If Stroom Proxy is configured with receiptCheckMode set to RECEIPT_POLICY and has downstreamHost configured, then it will periodically send a request to Stroom to fetch the latest copy of the Data Receipt Rules. If Stroom Proxy is unable to contact Stroom it will use the latest copy of the rules that it has.

Given that Stroom Proxy will only synchronise periodically, once a change is made to the rule set, there will be a delay before the new rules take effect.

Term Value Obfuscation

As a Stroom administrator you may not want the values used in the Data Receipt Rule expression terms to be visible when they are fetched by a remote Stroom Proxy (that may be maintained by another team). It is therefore possible to obfuscate the values used for the expression terms for certain configured fields. The fields that are obfuscated are controlled by the property stroom.receiptPolicy.obfuscatedFields.

For example, in the default configuration, Feed is an obfuscated field. Thus a term like Feed != FEED_XYZ would have its value obfuscated when fetched by Stroom Proxy. Stroom Proxy is able to similarly obfuscate meta data values for obfuscated fields in the same way to allow it to test the rule expression.

This prevents the Stroom Proxy administrator from being able to see the values used in the rules as they are not in plain text. Each value is salted with its own unique salt then hashed. The hash algorithm can be configured using stroom.receiptPolicy.obfuscationHashAlgorithm.

3 - Feed Name Generation

The auto-generation of Feed names using a Feed name template and various header values.

Auto-generation of Feed names allows Stroom and Stroom Proxy to generate the Feed name based on a configured template and the values of various mandatory and optional headers. This feature was conceived for Data Feed Identities but can be used in isolation if required.

When the property (app|proxy)Config.receive.feedNameGenerationEnabled is set to true, the Feed header is no longer required on data receipt and auto-generation of a Feed name will be attempted.

When data is supplied without the Feed header, the meta keys specified in (app|proxy)Config.receive.feedNameGenerationMandatoryHeaders become mandatory. If the mandatory headers are not supplied, the data will be rejected.

The property (app|proxy)Config.receive.feedNameTemplate is used to control the format of the generated Feed name. The template uses values from the headers, so should be configured in tandem with .receive.feedNameGenerationMandatoryHeaders, though can use optional headers that the client may or may not supply.

If the template parameter is not in the headers, then it will be replaced with nothing. The variables in the template (e.g. ${accountId}) are case-insensitive.

If enabled, Feed name generation happens on data receipt in both Stroom-Proxy and Stroom. You should therefore ensure the configuration for this feature is identical in Stroom and Stroom-Proxy.

The default configuration for Feed name generation is:

appConfig|proxyConfig: # applicable to both appConfig: and proxyConfig:
  receive: 
    ...
    feedNameGenerationEnabled: false
    feedNameGenerationMandatoryHeaders:
    - "AccountId" # A unique identifier for the owner of the system sending the data.
    - "Component" # The system/component that is sending the data (an account may have multiple).
    - "Format" # The data format (e.g. XML, JSON, etc.).
    - "Schema" # The schema that the data conforms to (e.g. event-logging).
    feedNameTemplate: "${accountid}-${component}-${format}-${schema}"

Assuming the above default configuration and that the client sends the following headers:

AccountId: 1234
Component: av-scanner
Format: XML
Schema: event-logging

This will result in an auto-generated Feed name of 1234-AV_SCANNER-XML-EVENT_LOGGING.

4 - Content Templates

Describes how Stroom can auto-generate content (i.e. Feeds and Pipelines) upon receipt of new data.

The aim of the Content Templates feature is to simplify the process of client systems sending data into Stroom. Instead of having to pre-create a Feed and Pipeline before a client can send data, Content Templates can be created to auto-create the content on receipt of the first Stream .

Content Templates are a set of expression rules with associated template content to generate when the rule matches on incoming data. If a client has used the correct headers and a Content Template matches, all the content required to process the data will be created and the data will be processed without any further involvement from the Stroom administrator.

In order to use Content Templates, the property appConfig.autoContentCreation.enable must be set to true.

Content Templates Screen

Content Templates can be managed in the Content Templates screen that is accessed from the main menu:

Administration
Content Templates
images/user-guide/data-receipt/ContentTemplates.png

The Content Templates screen

This screen allows a user with the Manage Content Templates application permission to create a number of content templates.

The settings available on a Content Template are as follows:

Template Name
A name for the template to aid the administrator when looking through a list of different templates.
Descriptions
An optional and more detailed description of the purpose of the template.
Template Type
Determines how the Pipeline specified by the Pipeline setting is used.
  • INHERIT_PIPELINE - A new pipeline will be created that inherits from the pipeline specified by Pipeline. The new pipeline will be created in the explorer tree folder defined by (app|proxy)Config.receive.destinationExplorerPathTemplate.

  • PROCESSOR_FILTER - A new processor filter will be added to the existing pipeline specified in the template. No new documents will be created.

Copy Pipeline Element Dependencies
If Copy Pipeline Element Dependencies is ticked and the Template Type is INHERIT_PIPELINE, any documents that are direct dependencies of the specified Pipeline (e.g. Text Converter or XSLT ) will be copied into the destination folder. The new Pipeline will have its dependencies changed to use the copied dependencies, allowing them to be edited without affecting the parent Pipeline.
Pipeline
An existing Pipeline to either inherit from or add a processor filter to, depending on the Template Type.
Processor Priority
The priority to assign to the pipeline processor when created. The higher the number the higher the priority. Value must be between 1 and 100. The default priority is 10.
Processor Max Concurrent Tasks
The maximum number of concurrent tasks to assign to the pipeline processor when created. Zero means un-bounded.
Expression
Each template has an expression that will be used to match on the headers when auto-generation of content has been triggered. The template expressions are evaluated in order from the top, the first to match the data is used.

If a template’s expression matches, content will be created according to settings in the template.

Configuration

The configuration for the Content Templates can be found here.

Content Auto-Creation

Depending on the configuration and the settings in the Content Template that matches on the data, the following will happen if the feed does not already exist. If the feed already exists then it is assumed the content creation has already happened or has been done manually, so nothing will happen.

INHERIT_PIPELINE Mode

  • Create a stroom user for the authenticated identity that has sent the data.

  • Create a stroom user group using the template defined by property groupTemplate.

    • Add the created stroom user to this group.
    • Add this group to the group defined by groupParentGroupName.
  • If Copy Pipeline Element Dependencies is ticked:

    • Create a stroom user group using the template defined by property additionalGroupTemplate.
      • Add the created stroom user to this group.
    • If additionalGroupParentGroupName is defined and doesn’t exist:
      • Create the Stroom user group specified in this property.
  • Create an explorer tree folder using the template defined by property destinationExplorerPathTemplate.

    • Grant VIEW permission to the created group.
    • Grant VIEW permission to the created additional group.
  • If Copy Pipeline Element Dependencies is ticked:

    • Create an explorer tree sub folder using the template defined by property destinationExplorerSubPathTemplate.
      • Grant VIEW permission to the created group.
      • Grant EDIT permission to the created additional group.
  • Create a Feed in the folder defined by destinationExplorerPathTemplate.

    • Grant VIEW permission to the created group.
    • Grant VIEW permission to the created additional group (if Copy Pipeline Dependencies is ticked).
  • Create a Pipeline in the folder defined by destinationExplorerPathTemplate and set it to inherit from the Pipeline defined in the Content Template.

    • Grant VIEW permission to the created group.

    • If Copy Pipeline Element Dependencies is ticked:

      • Copy the dependency documents of the parent Pipeline into this folder.
      • Grant VIEW permission to the created additional group.
    • Create a Processor Filter on the new Pipeline (using the priority and concurrency setting taken from the Content Template) with the following expression:

      Feed is X AND Type = Y

      [Where X is the Feed created above and Y is the stream type of the received data.]

  • If groupParentGroupName is defined:

    • Create the Stroom user group specified in this property if it doesn’t exist.
    • Add the group defined by groupTemplate to this group.
  • If Copy Pipeline Element Dependencies is ticked and additionalGroupParentGroupName is defined:

    • Create the Stroom user group specified in this property if it doesn’t exist.
    • Add the group defined by additionalGroupTemplate to this group.

Copy Dependencies Example

The following is an example of the content that will be created with the following assumptions:

  • The Feed name is 1234-AV_SCANNER-XML-EVENT_LOGGING.
  • AccountId: 1234 in the Meta data.
  • Copy Pipeline Element Dependencies is ticked on the Content Template.
  • Default autoContentCreation configuration.
System
Feeds(Administrators: OWNER)
1234(Administrators: OWNER, grp-1234: VIEW, grp-1234-dev: VIEW)
Content(Administrators: OWNER, grp-1234: VIEW, grp-1234-dev: EDIT)
1234-AV_SCANNER-XML-EVENT_LOGGING-dsParser(Administrators: OWNER, grp-1234: VIEW, grp-1234-dev: EDIT)
1234-AV_SCANNER-XML-EVENT_LOGGING-translationFilter(Administrators: OWNER, grp-1234: VIEW, grp-1234-dev: EDIT)
1234-AV_SCANNER-XML-EVENT_LOGGING(Administrators: OWNER, grp-1234: VIEW, grp-1234-dev: VIEW)
1234-AV_SCANNER-XML-EVENT_LOGGING(Administrators: OWNER, grp-1234: VIEW, grp-1234-dev: VIEW)

Don’t Copy Dependencies Example

The following is an example of the content that will be created with the following assumptions:

  • The Feed name is 1234-AV_SCANNER-XML-EVENT_LOGGING.
  • AccountId: 1234 in the Meta data.
  • Copy Pipeline Element Dependencies is NOT ticked on the Content Template.
  • Default autoContentCreation configuration.
System
Feeds(Administrators: OWNER)
1234(Administrators: OWNER, grp-1234: VIEW)
1234-AV_SCANNER-XML-EVENT_LOGGING(Administrators: OWNER, grp-1234: VIEW)
1234-AV_SCANNER-XML-EVENT_LOGGING(Administrators: OWNER, grp-1234: VIEW)

PROCESSOR_FILTER Mode

  • Create a stroom user for the authenticated identity that has sent the data.

  • Create a stroom user group using the template defined by property groupTemplate.

    • Add the created stroom user to this group.
  • Create an explorer tree folder using the template defined by property destinationExplorerPathTemplate.

    • Grant VIEW permission to the created group.
  • Create a Feed in the folder defined by destinationExplorerPathTemplate.

    • Grant VIEW permission to the created group.
  • Create a Processor Filter on the new Pipeline (using the priority and concurrency setting taken from the Content Template) with the following expression:

    Feed is X AND Type = Y

    [Where X is the Feed created above and Y is the stream type of the received data.]

  • If groupParentGroupName is defined:

    • Create the Stroom user group specified in this property if it doesn’t exist.
    • Add the group defined by groupTemplate to this group.

Example

The following is an example of the content that will be created with the following assumptions:

  • The Feed name is 1234-AV_SCANNER-XML-EVENT_LOGGING.
  • AccountId: 1234 in the Meta data.
  • Default autoContentCreation configuration.
System
Feeds(Administrators: OWNER)
1234(Administrators: OWNER, grp-1234: VIEW)
1234-AV_SCANNER-XML-EVENT_LOGGING(Administrators: OWNER, grp-1234: VIEW)

Expression Fields

When creating the expression in a Content Template, the user will be limited to a set of fields to match on. These fields will be matched against the meta data of the Stream. The list of fields that can be used are configured using the property .autoContentCreation.templateMatchFields.

5 - Data Feed Identities

Data Feed Identities is an authentication mechanism designed specifically for the /datafeed API.

Data Feed Identities are a new authentication mechanism for data receipt into both Stroom-Proxy and Stroom. It combines a set of authentication identities with a pre-defined set of static meta entries.

There are currently two types of Data Feed Identities:

  • Data Feed Keys - Similar to an API Key.
  • Certificate Identities - Uses an X509 Distinguished Name for authentication.

Both types of identity are written to one or more files that are placed on the Stroom or Stroom Proxy Host in a directory configured by .receive.dataFeedIdentitiesDir.

The following is an example of a file containing one of each type:

{
  "dataFeedIdentities" : [ {
    "type" : "DATA_FEED_KEY",
    "expiryDateEpochMs" : 1775237109581,
    "hash" : "$2a$10$JdngdVGxg6RGBeerku.JNusZdyyh4rNHYN5UeNKXRVdNUSNbg3NP6",
    "hashAlgorithm" : "BCRYPT_2A",
    "salt" : "$2a$10$JdngdVGxg6RGBeerku.JNu",
    "streamMetaData" : {
      "AccountId" : "1000",
      "MetaKey2" : "MetaKey2Val-1000",
      "MetaKey1" : "MetaKey1Val-1000"
    }
  }, {
    "type" : "CERTIFICATE_DN",
    "certificateDn" : "/DC=com/DC=example/DC=corp/OU=Users/CN=John Doe 2/emailAddress=john_doe@example.com",
    "expiryDateEpochMs" : 1775237109581,
    "streamMetaData" : {
      "AccountId" : "2002",
      "MetaKey2" : "MetaKey2Val-2002",
      "MetaKey1" : "MetaKey1Val-2002"
    }
  } ]
}

The file can contain zero-many of either type and the directory can contain zero-many of these files. This allows for generating Data Feed Keys with a life of say 26hrs, adding a new file every day and deleting files older than 2 days.

The file(s) will be read on boot and all hashed keys will be stored in memory for receipt authentication. Files added to this directory while Stroom-Proxy/Stroom is running will be read and added to the in-memory store of hashed keys. Files deleted from this directory will result in all entries associated with the file path being removed from the in-memory store of hashed keys.

Common properties

The following JSON properties are common to both types:

  • type - The type of the identity, one of (DATA_FEED_KEY|CERTIFICATE_DN).

  • expiryDateEpochMs - The time the identity expires expressed as milliseconds since the epoch.

  • streamMetaData - A map of Meta key/value pairs to set on the Stream’s Meta Data on receipt. The attributes in streamMetaData will overwrite any matching attribute keys in the received data.

The property .receive.dataFeedOwnerMetaKey defines the Meta key that will be used to extract the owner of the Data Feed Identity. By default this key is set to accountId. It is typically an identifier for a client team that may have one or more systems that require one or more Feeds in Stroom. An accountID can have many active Data Feed Identities.

Data Feed Keys

They allow for a set of hashed short life keys to be placed in a directory accessible to Stroom-Proxy/Stroom for receipt requests to be authenticated against.

{
  "type" : "DATA_FEED_KEY",
  "expiryDateEpochMs" : 1775237109581,
  "hash" : "$2a$10$JdngdVGxg6RGBeerku.JNusZdyyh4rNHYN5UeNKXRVdNUSNbg3NP6",
  "hashAlgorithm" : "BCRYPT_2A",
  "salt" : "$2a$10$JdngdVGxg6RGBeerku.JNu",
  "streamMetaData" : {
    "AccountId" : "1000",
    "MetaKey2" : "MetaKey2Val-1000",
    "MetaKey1" : "MetaKey1Val-1000"
  }
}

type must always be DATA_FEED_KEY for a Data Feed Key.

Data Feed Identities have an expiry date after which they will no longer work. Multiple files can be placed in the directory and all valid keys will be loaded.

The hashAlgorithmId is the identifier for the hash algorithm used to hash the key. The system creating the hashed data feed keys must use the same hash algorithm and parameters when hashing the key as Stroom will use when it hashes the key used in data receipt to validate them.

Currently the only hash algorithm available for use is Argon2 with an ID of 000 and the following parameters:

  • Hash length: 48
  • Iterations: 2
  • Memory KB: 65536

A Data Feed Key takes the following form:

sdk_<3 char hash algorithm ID>_<128 char random Base58 string>

The regular expression pattern for a Data Feed Key is

^sdk_[0-9]{3}_[A-HJ-NP-Za-km-z1-9]{128}$

Data Feed Identities are used in the same way as API Keys or OAuth2 tokens, i.e. using the Header Authorization: Bearer <data feed key>.

Certificate Identities

These identities allow client systems to authenticate with an X509 certificate. Typically the TLS will be terminated by an Nginx or load balancer sitting in front of Stroom/Stroom-Proxy, and it will pass the DN as a header (configured by .receive.x509CertificateDnHeader).

{
  "type" : "CERTIFICATE_DN",
  "certificateDn" : "/DC=com/DC=example/DC=corp/OU=Users/CN=John Doe 2/emailAddress=john_doe@example.com",
  "expiryDateEpochMs" : 1775237109581,
  "streamMetaData" : {
    "AccountId" : "2002",
    "MetaKey2" : "MetaKey2Val-2002",
    "MetaKey1" : "MetaKey1Val-2002"
  }
}

type must always be CERTIFICATE_DN for a Certificate Identity.

certificateDn is the certificate’s DN (Distinguished Name) in the format defined by .receive.x509CertificateDnFormat.

When a client sends data, the DN extracted from the header will be checked against all the DNs in the Certificate Identities. If one matches and is not expired, it will authenticate using the owner and set the Meta entries using streamMetaData.