This is the multi-page printable view of this section. Click here to print.
Data Receipt
- 1: Feed Status Checking
- 2: Data Receipt Rules
- 3: Feed Name Generation
- 4: Content Templates
- 5: Data Feed Identities
1 - Feed Status Checking
Feed status checking is Stroom’s legacy method for controlling data receipt. For a richer method of controlling data receipt, see Data Receipt Rules
If the property stroom.receive.receiptCheckMode is set to FEED_STATUS, the Feed Status value that has been set on the Feed is used to determine the action to perform on that data.
Feed Status Values
A Feed can have the following Feed Status values:
-
Receive - All data for this Feed will be received into Stroom / Stroom Proxy.
-
Reject - All data for this feed will be rejected. The client will get HTTP
406error with the message110 - Feed is not set to receive data. -
Drop - All data for this Feed will be silently dropped by Stroom / Stroom Proxy, i.e. discarded and not stored. The client will receive a HTTP
200response as if the data had been successfully received. This is for use if you do not want the client to know their data is being discarded.
Stroom Proxy
Stroom Proxy is also able to perform Feed status checking. Stroom Proxy does not have direct access to the Feed settings so has to perform the Feed status check by making a request to a downstream Stroom Proxy or Stroom. If a Stroom Proxy receives a Feed status check it will proxy that request to its own downstream Stroom / Stroom Proxy.
Stroom Proxy will cache the response it gets from the downstream, so that it doesn’t need to make a call for every stream received.
To configure Stroom Proxy for Feed status checking you need to set the following properties:
proxyConfig:
receive:
# The action to take if there is a problem with the data receipt rules, e.g.
# Stroom Proxy has been unable to contact Stroom to fetch the rules.
fallbackReceiveAction: "RECEIVE"
receiptCheckMode: "FEED_STATUS"
downstreamHost:
# The API key to use for authentication (unless OpenID Connect is being used)
apiKey: null
# The hostname of the downstream
hostname: null
# The port to connect to the downstream on
# If not set, will default to 80/443 depending on scheme.
port: null
# The scheme to connect to the downstream on
scheme: "https"
2 - Data Receipt Rules
Data Receipt Rules serves as an alternative to the legacy Feed status checking performed by Stroom Proxy and Stroom. It provides a much richer mechanism for controlling which received data streams are Received, Rejected or Dropped. It allows anyone with the Manage Data Receipt Rules Application Permission to create one or more rules to controls the receipt of data.
Data Receipt Rules can be accessed as follows:
Each rule is defined by a boolean expression (as used in Dashboards and Stream filtering) and the Action (Receive, Reject, Drop_ that will be performed if the data matches the rule. Rules are evaluated in ascending order by Rule Number. The action is taken from the first rule to match.
If no rules match then the data will be rejected by default, i.e. the rules are include rather than exclude filters.
If you want data to be received if no rules match then you can create a rule at the end of the list with an Action of Receive and no expression terms.
If a stream matches a rule that has an Accept action, it will still be subject to a check to see if the Feed actually exists.
This means that the rules do not need to contain an Accept rule to cover all of the Feeds in the system.
They only need to cover
The client will receive a 101 Feed is not defined error if it does not exist.
The screen operates in a similar way to Data Retention Rules in that rules can be moved up/down to change their importance, or enabled/disabled.
Fields
The fields available to use in the expression terms can be defined in the Fields tab. The terms will be evaluated against the stream’s meta data, i.e. a combination of the HTTP headers sent by the client and any that have been populated by Stroom Proxy or Stroom. This allows for the use of custom headers to aid in the filtering of data into Stroom.
Dictionaries are supported for use with the in dictionary condition.
The contents of the dictionary and any of the dictionaries that it inherits will be included in the data fetched by Stroom Proxy.
Note
You cannot use the same dictionary for multiple fields if any one of those fields is obfuscated.
Should you need to use the same dictionary for an obfuscated and a non-obfuscated field, you can create one empty dictionary for each and make them both import from the same source dictionary.
Stroom Configuration
Data Receipt Rules are controlled by the following configuration:
appConfig:
receiptPolicy:
# List of fields whose values will be obfuscated when the rules
# are fetched by Stroom Proxy
obfuscatedFields:
- "AccountId"
- "AccountName"
- "Component"
# ... truncated
- "UploadUserId"
- "UploadUsername"
- "X-Forwarded-For"
# The hash algorithm used to hash obfuscated values, one of:
# * SHA3_256
# * SHA2_256
# * BCRYPT
# * ARGON_2
# * SHA2_512
obfuscationHashAlgorithm: "SHA2_512"
# The initial list of fields to bootstrap a Stroom environment.
# Changing this has no effect one an environment has been started up.
receiptRulesInitialFields:
AccountId: "Text"
Component: "Text"
Compression: "Text"
content-length: "Text"
# ... truncated
Type: "Text"
UploadUsername: "Text"
UploadUserId: "Text"
user-agent: "Text"
X-Forwarded-For: "Text"
receive:
# The action to take if there is a problem with the data receipt rules, e.g.
# Stroom Proxy has been unable to contact Stroom to fetch the rules.
fallbackReceiveAction: "RECEIVE"
# The data receipt checking mode, one of:
# * FEED_STATUS - Use the legacy Feed Status Check method
# * RECEIPT_POLICY - Use the new Data Receipt Rules
# * RECEIVE_ALL - Receive ALL data with no checks
# * DROP_ALL - Drop ALL data with no checks
# * REJECT_ALL - Reject ALL data with no checks
receiptCheckMode: "RECEIPT_POLICY"
Stroom Proxy Configuration
appConfig:
receiptPolicy:
# Only set this if you need to supply a non-standard full url
# By default Proxy will use the known path for the Data Receipt Rules resource
# combined with the host/port/scheme from the `downstreamHost` config property.
receiveDataRulesUrl: null
# The frequency that the rules will be fetched from the downstream Stroom instance.
syncFrequency: "PT1M"
# Identical configuration to Stroom as described above.
# Stroom and Stroom Proxy can use different `receiptCheckMode` values, but typically
# they will be the same.
receiptPolicy:
Stroom Proxy Rule Synchronisation
If Stroom Proxy is configured with receiptCheckMode set to RECEIPT_POLICY and has downstreamHost configured, then it will periodically send a request to Stroom to fetch the latest copy of the Data Receipt Rules.
If Stroom Proxy is unable to contact Stroom it will use the latest copy of the rules that it has.
Given that Stroom Proxy will only synchronise periodically, once a change is made to the rule set, there will be a delay before the new rules take effect.
Term Value Obfuscation
As a Stroom administrator you may not want the values used in the Data Receipt Rule expression terms to be visible when they are fetched by a remote Stroom Proxy (that may be maintained by another team).
It is therefore possible to obfuscate the values used for the expression terms for certain configured fields.
The fields that are obfuscated are controlled by the property stroom.receiptPolicy.obfuscatedFields.
For example, in the default configuration, Feed is an obfuscated field.
Thus a term like Feed != FEED_XYZ would have its value obfuscated when fetched by Stroom Proxy.
Stroom Proxy is able to similarly obfuscate meta data values for obfuscated fields in the same way to allow it to test the rule expression.
Warning
Due to the way obfuscation works, you are limited by the expression conditions that can be used, e.g.contains, >, < etc. are not allowed, but == and != are.
Stroom will tell you if you are using an unsupported condition for the field.
This prevents the Stroom Proxy administrator from being able to see the values used in the rules as they are not in plain text.
Each value is salted with its own unique salt then hashed.
The hash algorithm can be configured using stroom.receiptPolicy.obfuscationHashAlgorithm.
Note
Obfuscation is not encryption. The fetched data includes the salt values and given enough compute/time it would be possible to brute force the reversal of the hashing. Strong hashing algorithms such as BCrypt or Argon2 can mitigate against this but not remove the risk. If the rule values are too sensitive then you will have to let the Stroom Proxy accept the data and have Stroom do the full rule based checking.3 - Feed Name Generation
Auto-generation of Feed names allows Stroom and Stroom Proxy to generate the Feed name based on a configured template and the values of various mandatory and optional headers. This feature was conceived for Data Feed Identities but can be used in isolation if required.
When the property (app|proxy)Config.receive.feedNameGenerationEnabled is set to true, the Feed header is no longer required on data receipt and auto-generation of a Feed name will be attempted.
When data is supplied without the Feed header, the meta keys specified in (app|proxy)Config.receive.feedNameGenerationMandatoryHeaders become mandatory.
If the mandatory headers are not supplied, the data will be rejected.
The property (app|proxy)Config.receive.feedNameTemplate is used to control the format of the generated Feed name.
The template uses values from the headers, so should be configured in tandem with .receive.feedNameGenerationMandatoryHeaders, though can use optional headers that the client may or may not supply.
If the template parameter is not in the headers, then it will be replaced with nothing.
The variables in the template (e.g. ${accountId}) are case-insensitive.
If enabled, Feed name generation happens on data receipt in both Stroom-Proxy and Stroom. You should therefore ensure the configuration for this feature is identical in Stroom and Stroom-Proxy.
The default configuration for Feed name generation is:
appConfig|proxyConfig: # applicable to both appConfig: and proxyConfig:
receive:
...
feedNameGenerationEnabled: false
feedNameGenerationMandatoryHeaders:
- "AccountId" # A unique identifier for the owner of the system sending the data.
- "Component" # The system/component that is sending the data (an account may have multiple).
- "Format" # The data format (e.g. XML, JSON, etc.).
- "Schema" # The schema that the data conforms to (e.g. event-logging).
feedNameTemplate: "${accountid}-${component}-${format}-${schema}"
See Also
For more explanation of the receive configuration branch, see Receive Configuration.
Assuming the above default configuration and that the client sends the following headers:
AccountId: 1234
Component: av-scanner
Format: XML
Schema: event-logging
This will result in an auto-generated Feed name of 1234-AV_SCANNER-XML-EVENT_LOGGING.
Note
When a template variable is replaced with a value from the headers, it is converted to upper case and any characters that are NOT in the regular expression character class [A-Z0-9_], will be replaced by a _ character.
Any static text in the template will also be converted to upper case and the supported characters for static text are [A-Z0-9_-], with all other characters being replaced with a _.
4 - Content Templates
The aim of the Content Templates feature is to simplify the process of client systems sending data into Stroom. Instead of having to pre-create a Feed and Pipeline before a client can send data, Content Templates can be created to auto-create the content on receipt of the first Stream .
Content Templates are a set of expression rules with associated template content to generate when the rule matches on incoming data. If a client has used the correct headers and a Content Template matches, all the content required to process the data will be created and the data will be processed without any further involvement from the Stroom administrator.
In order to use Content Templates, the property appConfig.autoContentCreation.enable must be set to true.
Content Templates Screen
Content Templates can be managed in the Content Templates screen that is accessed from the main menu:
This screen allows a user with the Manage Content Templates application permission to create a number of content templates.
The settings available on a Content Template are as follows:
- Template Name
- A name for the template to aid the administrator when looking through a list of different templates.
- Descriptions
- An optional and more detailed description of the purpose of the template.
- Template Type
- Determines how the Pipeline specified by the Pipeline setting is used.
-
INHERIT_PIPELINE- A new pipeline will be created that inherits from the pipeline specified by Pipeline. The new pipeline will be created in the explorer tree folder defined by(app|proxy)Config.receive.destinationExplorerPathTemplate. -
PROCESSOR_FILTER- A new processor filter will be added to the existing pipeline specified in the template. No new documents will be created.
-
- Copy Pipeline Element Dependencies
- If Copy Pipeline Element Dependencies is ticked and the Template Type is
INHERIT_PIPELINE, any documents that are direct dependencies of the specified Pipeline (e.g. Text Converter or XSLT ) will be copied into the destination folder. The new Pipeline will have its dependencies changed to use the copied dependencies, allowing them to be edited without affecting the parent Pipeline. - Pipeline
- An existing Pipeline to either inherit from or add a processor filter to, depending on the Template Type.
- Processor Priority
- The priority to assign to the pipeline processor when created. The higher the number the higher the priority. Value must be between 1 and 100. The default priority is 10.
- Processor Max Concurrent Tasks
- The maximum number of concurrent tasks to assign to the pipeline processor when created. Zero means un-bounded.
- Expression
- Each template has an expression that will be used to match on the headers when auto-generation of content has been triggered.
The template expressions are evaluated in order from the top, the first to match the data is used.
If a template’s expression matches, content will be created according to settings in the template.
Configuration
The configuration for the Content Templates can be found here.
Content Auto-Creation
Depending on the configuration and the settings in the Content Template that matches on the data, the following will happen if the feed does not already exist. If the feed already exists then it is assumed the content creation has already happened or has been done manually, so nothing will happen.
INHERIT_PIPELINE Mode
-
Create a stroom user for the authenticated identity that has sent the data.
-
Create a stroom user group using the template defined by property
groupTemplate.- Add the created stroom user to this group.
- Add this group to the group defined by
groupParentGroupName.
-
If Copy Pipeline Element Dependencies is ticked:
- Create a stroom user group using the template defined by property
additionalGroupTemplate.- Add the created stroom user to this group.
- If
additionalGroupParentGroupNameis defined and doesn’t exist:- Create the Stroom user group specified in this property.
- Create a stroom user group using the template defined by property
-
Create an explorer tree folder using the template defined by property
destinationExplorerPathTemplate.- Grant
VIEWpermission to the created group. - Grant
VIEWpermission to the created additional group.
- Grant
-
If Copy Pipeline Element Dependencies is ticked:
- Create an explorer tree sub folder using the template defined by property
destinationExplorerSubPathTemplate.- Grant
VIEWpermission to the created group. - Grant
EDITpermission to the created additional group.
- Grant
- Create an explorer tree sub folder using the template defined by property
-
Create a Feed in the folder defined by
destinationExplorerPathTemplate.- Grant
VIEWpermission to the created group. - Grant
VIEWpermission to the created additional group (if Copy Pipeline Dependencies is ticked).
- Grant
-
Create a Pipeline in the folder defined by
destinationExplorerPathTemplateand set it to inherit from the Pipeline defined in the Content Template.-
Grant
VIEWpermission to the created group. -
If Copy Pipeline Element Dependencies is ticked:
- Copy the dependency documents of the parent Pipeline into this folder.
- Grant
VIEWpermission to the created additional group.
-
Create a Processor Filter on the new Pipeline (using the priority and concurrency setting taken from the Content Template) with the following expression:
Feed
isX AND Type=Y[Where X is the Feed created above and Y is the stream type of the received data.]
-
-
If
groupParentGroupNameis defined:- Create the Stroom user group specified in this property if it doesn’t exist.
- Add the group defined by
groupTemplateto this group.
-
If Copy Pipeline Element Dependencies is ticked and
additionalGroupParentGroupNameis defined:- Create the Stroom user group specified in this property if it doesn’t exist.
- Add the group defined by
additionalGroupTemplateto this group.
Copy Dependencies Example
The following is an example of the content that will be created with the following assumptions:
- The Feed name is
1234-AV_SCANNER-XML-EVENT_LOGGING. AccountId: 1234in the Meta data.- Copy Pipeline Element Dependencies is ticked on the Content Template.
- Default
autoContentCreationconfiguration.
Don’t Copy Dependencies Example
The following is an example of the content that will be created with the following assumptions:
- The Feed name is
1234-AV_SCANNER-XML-EVENT_LOGGING. AccountId: 1234in the Meta data.- Copy Pipeline Element Dependencies is NOT ticked on the Content Template.
- Default
autoContentCreationconfiguration.
PROCESSOR_FILTER Mode
-
Create a stroom user for the authenticated identity that has sent the data.
-
Create a stroom user group using the template defined by property
groupTemplate.- Add the created stroom user to this group.
-
Create an explorer tree folder using the template defined by property
destinationExplorerPathTemplate.- Grant
VIEWpermission to the created group.
- Grant
-
Create a Feed in the folder defined by
destinationExplorerPathTemplate.- Grant
VIEWpermission to the created group.
- Grant
-
Create a Processor Filter on the new Pipeline (using the priority and concurrency setting taken from the Content Template) with the following expression:
Feed
isX AND Type=Y[Where X is the Feed created above and Y is the stream type of the received data.]
-
If
groupParentGroupNameis defined:- Create the Stroom user group specified in this property if it doesn’t exist.
- Add the group defined by
groupTemplateto this group.
Example
The following is an example of the content that will be created with the following assumptions:
- The Feed name is
1234-AV_SCANNER-XML-EVENT_LOGGING. AccountId: 1234in the Meta data.- Default
autoContentCreationconfiguration.
Expression Fields
When creating the expression in a Content Template, the user will be limited to a set of fields to match on.
These fields will be matched against the meta data of the Stream.
The list of fields that can be used are configured using the property .autoContentCreation.templateMatchFields.
5 - Data Feed Identities
/datafeed API.Data Feed Identities are a new authentication mechanism for data receipt into both Stroom-Proxy and Stroom. It combines a set of authentication identities with a pre-defined set of static meta entries.
There are currently two types of Data Feed Identities:
- Data Feed Keys - Similar to an API Key.
- Certificate Identities - Uses an X509 Distinguished Name for authentication.
Both types of identity are written to one or more files that are placed on the Stroom or Stroom Proxy Host in a directory configured by .receive.dataFeedIdentitiesDir.
The following is an example of a file containing one of each type:
{
"dataFeedIdentities" : [ {
"type" : "DATA_FEED_KEY",
"expiryDateEpochMs" : 1775237109581,
"hash" : "$2a$10$JdngdVGxg6RGBeerku.JNusZdyyh4rNHYN5UeNKXRVdNUSNbg3NP6",
"hashAlgorithm" : "BCRYPT_2A",
"salt" : "$2a$10$JdngdVGxg6RGBeerku.JNu",
"streamMetaData" : {
"AccountId" : "1000",
"MetaKey2" : "MetaKey2Val-1000",
"MetaKey1" : "MetaKey1Val-1000"
}
}, {
"type" : "CERTIFICATE_DN",
"certificateDn" : "/DC=com/DC=example/DC=corp/OU=Users/CN=John Doe 2/emailAddress=john_doe@example.com",
"expiryDateEpochMs" : 1775237109581,
"streamMetaData" : {
"AccountId" : "2002",
"MetaKey2" : "MetaKey2Val-2002",
"MetaKey1" : "MetaKey1Val-2002"
}
} ]
}
The file can contain zero-many of either type and the directory can contain zero-many of these files. This allows for generating Data Feed Keys with a life of say 26hrs, adding a new file every day and deleting files older than 2 days.
The file(s) will be read on boot and all hashed keys will be stored in memory for receipt authentication. Files added to this directory while Stroom-Proxy/Stroom is running will be read and added to the in-memory store of hashed keys. Files deleted from this directory will result in all entries associated with the file path being removed from the in-memory store of hashed keys.
Common properties
The following JSON properties are common to both types:
-
type- The type of the identity, one of (DATA_FEED_KEY|CERTIFICATE_DN). -
expiryDateEpochMs- The time the identity expires expressed as milliseconds since the epoch. -
streamMetaData- A map of Meta key/value pairs to set on the Stream’s Meta Data on receipt. The attributes instreamMetaDatawill overwrite any matching attribute keys in the received data.
The property .receive.dataFeedOwnerMetaKey defines the Meta key that will be used to extract the owner of the Data Feed Identity.
By default this key is set to accountId.
It is typically an identifier for a client team that may have one or more systems that require one or more Feeds in Stroom.
An accountID can have many active Data Feed Identities.
Data Feed Keys
They allow for a set of hashed short life keys to be placed in a directory accessible to Stroom-Proxy/Stroom for receipt requests to be authenticated against.
{
"type" : "DATA_FEED_KEY",
"expiryDateEpochMs" : 1775237109581,
"hash" : "$2a$10$JdngdVGxg6RGBeerku.JNusZdyyh4rNHYN5UeNKXRVdNUSNbg3NP6",
"hashAlgorithm" : "BCRYPT_2A",
"salt" : "$2a$10$JdngdVGxg6RGBeerku.JNu",
"streamMetaData" : {
"AccountId" : "1000",
"MetaKey2" : "MetaKey2Val-1000",
"MetaKey1" : "MetaKey1Val-1000"
}
}
type must always be DATA_FEED_KEY for a Data Feed Key.
Data Feed Identities have an expiry date after which they will no longer work. Multiple files can be placed in the directory and all valid keys will be loaded.
The hashAlgorithmId is the identifier for the hash algorithm used to hash the key.
The system creating the hashed data feed keys must use the same hash algorithm and parameters when hashing the key as Stroom will use when it hashes the key used in data receipt to validate them.
Currently the only hash algorithm available for use is Argon2 with an ID of 000 and the following parameters:
- Hash length: 48
- Iterations: 2
- Memory KB: 65536
A Data Feed Key takes the following form:
sdk_<3 char hash algorithm ID>_<128 char random Base58 string>
The regular expression pattern for a Data Feed Key is
^sdk_[0-9]{3}_[A-HJ-NP-Za-km-z1-9]{128}$
Data Feed Identities are used in the same way as API Keys or OAuth2 tokens, i.e. using the Header Authorization: Bearer <data feed key>.
Certificate Identities
These identities allow client systems to authenticate with an X509 certificate.
Typically the TLS will be terminated by an Nginx or load balancer sitting in front of Stroom/Stroom-Proxy, and it will pass the DN as a header (configured by .receive.x509CertificateDnHeader).
{
"type" : "CERTIFICATE_DN",
"certificateDn" : "/DC=com/DC=example/DC=corp/OU=Users/CN=John Doe 2/emailAddress=john_doe@example.com",
"expiryDateEpochMs" : 1775237109581,
"streamMetaData" : {
"AccountId" : "2002",
"MetaKey2" : "MetaKey2Val-2002",
"MetaKey1" : "MetaKey1Val-2002"
}
}
type must always be CERTIFICATE_DN for a Certificate Identity.
certificateDn is the certificate’s DN (Distinguished Name) in the format defined by .receive.x509CertificateDnFormat.
When a client sends data, the DN extracted from the header will be checked against all the DNs in the Certificate Identities.
If one matches and is not expired, it will authenticate using the owner and set the Meta entries using streamMetaData.