1 - Common Configuration
Configuration common to Stroom and Stroom-Proxy.
This YAML file, sometimes known as the Dropwizard configuration file (as it conforms to a structure defined by Dropwizard) is the primary means of configuring Stroom/Stroom-Proxy.
As a minimum this file should be used to configure anything that needs to be set before stroom can start up, e.g. web server, logging, database connection details, etc.
It is also used to configure anything that is specific to a node in a stroom cluster.
If you are using some form of scripted deployment, e.g. ansible then it can be used to set all stroom properties for the environment that stroom runs in.
If you are not using scripted deployments then you can maintain stroom’s node agnostic configuration properties via the user interface.
Config File Structure
This file contains both the Dropwizard configuration settings (settings for ports, paths and application logging) and the Stroom/Stroom-Proxy application specific properties configuration.
The file is in YAML format and the application properties are located under the appConfig key.
For details of the Dropwizard configuration structure, see
here
.
The file is split into sections using these keys:
server - Configuration of the web server, e.g. ports, paths, request logging.
logging - Configuration of application logging
jerseyClients - Configuration of the various Jersey HTTP clients in use.
See Jersey HTTP Client Configuration.
- Application specific configuration:
appConfig - The Stroom configuration properties.
These properties can be viewed/modified in the user interface.
proxyConfig - The Stroom-Proxy configuration properties.
These properties can be viewed/modified in the user interface.
The following is an example of the YAML configuration file for Stroom:
# Dropwizard configuration section
server:
# e.g. ports and paths
logging:
# e.g. logging levels/appenders
jerseyClients:
DEFAULT:
# Configuration of the named client
# Stroom properties configuration section
appConfig:
commonDbDetails:
connection:
jdbcDriverClassName: ${STROOM_JDBC_DRIVER_CLASS_NAME:-com.mysql.cj.jdbc.Driver}
jdbcDriverUrl: ${STROOM_JDBC_DRIVER_URL:-jdbc:mysql://localhost:3307/stroom?useUnicode=yes&characterEncoding=UTF-8}
jdbcDriverUsername: ${STROOM_JDBC_DRIVER_USERNAME:-stroomuser}
jdbcDriverPassword: ${STROOM_JDBC_DRIVER_PASSWORD:-stroompassword1}
contentPackImport:
enabled: true
...
The following is an example of the YAML configuration file for Stroom-Proxy:
# Dropwizard configuration section
server:
# e.g. ports and paths
logging:
# e.g. logging levels/appenders
jerseyClients:
DEFAULT:
# Configuration of the named client
# Stroom properties configuration section
proxyConfig:
path:
home: /some/path
...
appConfig Section
The appConfig section is special as it maps to the Properties seen in the Stroom user interface so values can be managed in the file or via the Properties screen in the Stroom UI.
The other sections of the file can only be managed via the YAML file.
In the Stroom user interface, properties are named with a dot notation key, e.g. stroom.contentPackImport.enabled.
Each part of the dot notation property name represents a key in the YAML file, e.g. for this example, the location in the YAML would be:
appConfig:
contentPackImport:
enabled: true # stroom.contentPackImport.enabled
The stroom part of the dot notation name is replaced with appConfig.
For more details on the link between this YAML file and Stroom Properties, see Properties
Variable Substitution
The YAML configuration file supports Bash style variable substitution in the form of:
This allows values to be set either directly in the file or via an environment variable, e.g.
jdbcDriverClassName: ${STROOM_JDBC_DRIVER_CLASS_NAME:-com.mysql.cj.jdbc.Driver}
In the above example, if the STROOM_JDBC_DRIVER_CLASS_NAME environment variable is not set then the value com.mysql.cj.jdbc.Driver will be used instead.
Typed Values
YAML supports typed values rather than just strings, see https://yaml.org/refcard.html.
YAML understands booleans, strings, integers, floating point numbers, as well as sequences/lists and maps.
Some properties will be represented differently in the user interface to the YAML file.
This is due to how values are stored in the database and how the current user interface works.
This will likely be improved in future versions.
For details of how different types are represented in the YAML and the UI, see Data Types.
Server configuration
The server section controls the configuration of the Jetty web server.
For full details of how to configure the server section see:
The following is an example of the configuration for an application listening on HTTP.
server:
# The base path for the main application and its API
applicationContextPath: "/"
# The base path for the admininstration pages/API
# For Stroom-Proxy the default is /proxyAdmin
adminContextPath: "/stroomAdmin"
# The scheme/port for the main application and its API
applicationConnectors:
- type: http
# For Stroom-Proxy the default is 8090
port: 8080
# Uses X-Forwarded-*** headers in request log instead of proxy server details.
useForwardedHeaders: true
# The scheme/port for the admininstration pages/API
adminConnectors:
- type: http
# For Stroom-Proxy the default is 8091
port: 8081
useForwardedHeaders: true
Common Application Configuration
This section details configuration that is common in both the Stroom appConfig and Stroom-Proxy proxyConfig sections.
Receive Configuration
Configuration for controlling the receipt of data into Stroom and Stroom-Proxy through the /datafeed API.
appConfig / proxyConfig:
receive:
# An allow-list containing IP addresses or fully qualified host names to verify that the direct sender
# of a request (e.g. a load balancer or reverse proxy) is trusted to supply certificate/DN headers
# as configured with 'x509CertificateHeader' and 'x509CertificateDnHeader'.
# If this list is null/empty then no check will be made on the client's address.
allowedCertificateProviders: []
# Standard cache configuration block for the cache of authenticated Datafeed Keys.
# This cache is used to avoid having to re-verify every data feed key.
authenticatedDataFeedKeyCache:
# If true, the sender will be authenticated using a certificate or token depending on the
# state of tokenAuthenticationEnabled and certificateAuthenticationEnabled. If the sender
# can't be authenticated an error will be returned to the client
# If false, then authentication will be performed if a token/key/certificate
# is present, otherwise data will be accepted without a sender identity
authenticationRequired: true
# The meta key that is used to identify the owner of a Data Feed Key. This
# may be an AccountId or similar. It must be provided as a header when sending data
# using the associated Data Feed Key, and its value will be checked against the value
# held with the hashed Data Feed Key by Stroom. Default value is 'AccountId'.
# Case does not matter
dataFeedKeyOwnerMetaKey: "AccountId"
# The directory where Stroom will look for datafeed key files.
# Only used if datafeedKeyAuthenticationEnabled is true
# If the value is a relative path then it will be treated as being
# relative to stroom.path.home. Data feed key files must have the extension .json.
# Files in sub-directory will be ignored.
dataFeedKeysDir: "data_feed_keys"
# The types of authentication that are enabled for data receipt.
# One or more of
# TOKEN - A Stroom API Key or an OAuth token in the 'Authorization' header
# CERTIFICATE - An X509 certificate on the request or a DN in the header configured
# by .receive.x509CertificateDnHeader
# DATA_FEED_KEY - A Stroom Data Feed Key in the 'Authorization' header
enabledAuthenticationTypes:
- "TOKEN"
- "CERTIFICATE"
# If receiptCheckMode is RECEIPT_POLICY or FEED_STATUS and stroom/proxy is
# unable to perform the receipt check, then this action will be used as a fallback
# until the receipt check can be successfully performed
fallbackReceiveAction: "RECEIVE"
# If true the client is not required to set the 'Feed' header. If Feed is not present
# a feed name will be generated based on the template specified by the
# 'feedNameTemplate' property. If false (the default), a populated 'Feed'
# header will be required
feedNameGenerationEnabled: false
# The set of header keys are mandatory if feedNameGenerationEnabled is set to true.
# Should be set to complement the header keys used in 'feedNameTemplate', but may be a
# sub-set of those in the template to allow for optional headers
feedNameGenerationMandatoryHeaders:
- "AccountId"
- "Component"
- "Format"
- "Schema"
# A template for generating a feed name from a set of headers. The value of
# each header referenced in the template will have any unsuitable characters
# replaced with '_'.
# If this property is set in the YAML file, use single quotes to prevent the
# variables being expanded when the config file is loaded
feedNameTemplate: "${accountid}-${component}-${format}-${schema}"
# If defined then states the maximum size of a request (uncompressed for gzip requests).
# Will return a 413 Content Too Long response code for any requests exceeding this
# value. If undefined then there is no limit to the size of the request.
maxRequestSize: null
# Set of supported meta type names. This set must contain all of the names
# in the default value for this property but can contain additional names.
metaTypes:
- "Context"
- "Detections"
- "Error"
- "Events"
- "Meta Data"
- "Raw Events"
- "Raw Reference"
- "Records"
- "Reference"
- "Test Events"
- "Test Reference"
# Controls how or whether data is checked on receipt. Valid values
# (FEED_STATUS|RECEIPT_POLICY|RECEIVE_ALL|REJECT_ALL|DROP_ALL)
receiptCheckMode: "FEED_STATUS"
# The format of the Distinguished Name used in the certificate. Valid values are
# LDAP and OPEN_SSL, where LDAP is the default
x509CertificateDnFormat: "LDAP"
# The HTTP header key used to extract the distinguished name (DN) as obtained from an X509 certificate.
# This is used when a load balancer does the SSL/mTLS termination and passes the client DN though
# in a header. Only used for
# authentication if a value is set and 'enabledAuthenticationTypes' includes CERTIFICATE
x509CertificateDnHeader: "X-SSL-CLIENT-S-DN"
# The HTTP header key used to extract an X509 certificate. This is used when a load balancer does the
# SSL/mTLS termination and passes the client certificate though in a header. Only used for
# authentication if a value is set and 'enabledAuthenticationTypes' includes CERTIFICATE
x509CertificateHeader: "X-SSL-CERT"
Cache Configuration
Multiple configuration branches in both Stroom and Stroom-Proxy have one or more properties for configuring a cache.
Each of these share the same structure and will typically be named xxxCache, e.g. feedStatusCache or metaTypeCache.
Warning
The default values for each property within the cache config will be specific to the cache.
Care needs to be taken when changing the cache properties to avoid changing the behaviour of the cache, e.g. changing from having a expireAfterWrite value to having a expireAfterAccess value may prevent items from aging off as expected.
xxxCache:
# Specifies that each entry should be automatically removed from the cache once
# this duration has elapsed after the entry's creation, the most recent replacement of
# its value, or its last read. In ISO-8601 duration format, e.g. 'PT10M'. If no value is set then
# entries will not be aged out based these criteria
expireAfterAccess:
# Specifies that each entry should be automatically removed from the cache once
# a fixed duration has elapsed after the entry's creation, or the most recent replacement of its value.
# In ISO-8601 duration format, e.g. 'PT5M'. If no value is set then entries will not be aged out based on
# these criteria.
expireAfterWrite:
# Specifies the maximum number of entries the cache may contain. Note that the cache
# may evict an entry before this limit is exceeded or temporarily exceed the threshold while evicting.
# As the cache size grows close to the maximum, the cache evicts entries that are less likely to be used
# again. For example, the cache may evict an entry because it hasn't been used recently or very often.
# When size is zero, elements will be evicted immediately after being loaded into the cache. This can
# be useful in testing, or to disable caching temporarily without a code change. If no value is set then
# no size limit will be applied
maximumSize:
# Specifies that each entry should be automatically refreshed in the cache after
# a fixed duration has elapsed after the entry's creation, or the most recent replacement of its value.
# In ISO-8601 duration format, e.g. 'PT5M'. Refreshing is performed asynchronously and the current value
# provided until the refresh has occurred. This mechanism allows the cache to update values without any
# impact on performance
refreshAfterWrite:
# Determines whether/how statistics are captured on cache usage
# (e.g. hits, misses, entries, etc.). Values are (NONE, INTERNAL, DROPWIZARD_METRICS).
# NONE means capture no stats, offering a very slight performance gain, but the Caches screen in Stroom
# won't be able to show any stats for this cache.
# INTERNAL means the stats are captured but are only accessible via the Stroom Caches screen, thus not
# suitable for Stroom-Proxy.
# DROPWIZARD_METRICS means the stats are captured and are accessible via the Stroom Caches screen AND via
# the metrics servlet on the admin port for integration with tools like Graphite/Collectd
# The default for Stroom is INTERNAL, the default for Stroom-Proxy is DROPWIZARD_METRICS
statisticsMode:
Open ID Configuration
Both Stroom and Stroom-Proxy share the same configuration structure for configuring Open ID Connect authentication.
This section of config is only applicable if appConfig/proxyConfig.security.authentication.identityProviderType is set to EXTERNAL_IDP.
appConfig / proxyConfig:
security:
authentication:
openId:
# A set of audience claim values, one of which must appear in the audience
# claim in the token.
# If empty, no validation will be performed on the audience claim
# If audienceClaimRequired is false and there is no audience claim in the token,
# then allowedAudiences will be ignored
allowedAudiences: []
# If true the token will fail validation if the audience claim is not present
# and allowedAudiences is not empty
audienceClaimRequired: false
# The authentication endpoint used in OpenId authentication
# Should only be set if not using a configuration endpoint
authEndpoint: null
# If custom scopes are required for client_credentials requests then this should be
# set to replace the default of 'openid'. E.g. for Azure AD you will likely need to set
# this to 'openid' and '<your-app-id-uri>/.default>'
clientCredentialsScopes:
- "openid"
# The client ID used in OpenId authentication.
clientId: null
# The client secret used in OpenId authentication.
clientSecret: null
# If using an AWS load balancer to handle the authentication, set this to the Amazon
# Resource Names (ARN) of the load balancer(s) fronting stroom, which will be something
# like 'arn:aws:elasticloadbalancing:region-code:account-id:loadbalance
# /app/load-balancer-name/load-balancer-id'.
# This config value will be used to verify the 'signer' in the JWT header.
# Each value is the first N characters of the ARN and as a minimum must include up to
# the colon after the account-id, i.e.
# 'arn:aws:elasticloadbalancing:region-code:account-id:'
# See https://docs.aws.amazon.com/elasticloadbalancing/latest/application/listener-authenticate-users.html#user-claims-encodin
expectedSignerPrefixes: []
# Some OpenId providers, e.g. AWS Cognito, require a form to be used for token requests.
formTokenRequest: true
# A template to build the user's full name using claim values as variables in the
# template. E.g '${firstName} ${lastName}' or '${name}'.
# If this property is set in the YAML file, use single quotes to prevent the
# variables being expanded when the config file is loaded. Note: claim names are
# case sensitive
fullNameClaimTemplate: "${name}"
# The type of Open ID Connect identity provider that stroom/prox
# will use for authentication. Valid values are:
# INTERNAL_IDP - Stroom's internal IDP. Not valid for Stroom-Proxy.
# EXTERNAL_IDP - An external IDP such as KeyCloak/Cognito,
# TEST_CREDENTIALS - Use hard-coded authentication credentials for test/demo only and
# NO_IDP - No IDP is used. API keys are set in config for feed status checks. Only for use by Stroom-Proxy
# Changing this property will require a restart of the application
identityProviderType: "NO_IDP"
# The issuer used in OpenId authentication.
# Should only be set if not using a configuration endpoint
issuer: null
# The URI to obtain the JSON Web Key Set from in OpenId authentication
# Should only be set if not using a configuration endpoint
jwksUri: null
# The logout endpoint for the identity provider
# This is not typically provided by the configuration endpoint
logoutEndpoint: null
# The name of the URI parameter to use when passing the logout redirect URI to the IDP.
# This is here as the spec seems to have changed from 'redirect_uri' to
# 'post_logout_redirect_uri'
logoutRedirectParamName: "post_logout_redirect_uri"
# You can set an openid-configuration URL to automatically configure much of the openid
# settings. Without this the other endpoints etc must be set manually
openIdConfigurationEndpoint: null
# If the token is signed by AWS then use this pattern to form the URI to obtain the
# public key from. The pattern supports the variables '${awsRegion}' and '${keyId}'.
# Multiple instances of a variable are also supported.
# If this property is set in the YAML file, use single quotes to prevent the
# variables being expanded when the config file is loaded.
publicKeyUriPattern: "https://public-keys.auth.elb.${awsRegion}.amazonaws.com/${keyId}"
# If custom auth flow request scopes are required then this should be set to replace
# the defaults of 'openid' and 'email'.
requestScopes:
- "openid"
- "email"
# The token endpoint used in OpenId authentication
# Should only be set if not using a configuration endpoint
tokenEndpoint: null
# The Open ID Connect claim used to link an identity on the IDP to a stroom user.
# Must uniquely identify the user on the IDP and not be subject to change. Uses 'sub' by
# default
uniqueIdentityClaim: "sub"
# The Open ID Connect claim used to provide a more human friendly username for a user
# than that provided by uniqueIdentityClaim. It is not guaranteed to be unique and may
# change
userDisplayNameClaim: "preferred_username"
# A set of issuers (in addition to the 'issuer' property that is provided by the IDP
# that are deemed valid when seen in a token. If no additional valid issuers are
# required then set this to an empty set. Also this is used to validate the 'issuer'
# returned by the IDP when it is not a sub path of 'openIdConfigurationEndpoint'. If
# this set is empty then Stroom will verify that the
validIssuers: []
Jersey HTTP Client Configuration
Stroom and Stroom Proxy use the
Jersey
client for making HTTP connections with other nodes or other systems (e.g. Open ID Connect identity providers).
In the YAML file, the jerseyClients key controls the configuration of the various clients in use.
To allow complete control of the client configuration, Stroom uses the concept of named client configurations.
Each named client will be unique to a destination (where a destination is typically a server or a cluster of functionally identical servers).
Thus the configuration of the connections to each of those destinations can be configured independently.
The client names are as follows:
DEFAULT - The default client configuration used if a named configuration is not present.
AWS_PUBLIC_KEYS - Connections to fetch AWS public keys used in Open ID Connect authentication.
DOWNSTREAM - Connections to downstream proxy/stroom instances to check feed status. (Stroom Proxy only).
OPEN_ID - Connections to an Open ID Connect identity provider, e.g. Cognito, Azure AD, KeyCloak, etc.
STROOM - Inter-node communications within the Stroom cluster (Stroom only).
Note
If a named configuration does not exist then the configuration for DEFAULT will be used.
If DEFAULT is not defined in the configuration then the Dropwizard defaults will be used.
The following is an example of how the clients are configured in the YAML file:
jerseyClients:
DEFAULT:
# Default client configuration, e.g.
timeout: 500ms
STROOM:
# Configuration items for stroom inter-node communications
timeout: 30s
# etc.
The configuration keys (along with their default values and descriptions) for each client can be found here:
The following is another example including most keys:
jerseyClients:
DEFAULT:
minThreads: 1
maxThreads: 128
workQueueSize: 8
gzipEnabled: true
gzipEnabledForRequests: true
chunkedEncodingEnabled: true
timeout: 500ms
connectionTimeout: 500ms
timeToLive: 1h
cookiesEnabled: false
maxConnections: 1024
maxConnectionsPerRoute: 1024
keepAlive: 0ms
retries: 0
userAgent: <application name> (<client name>)
proxy:
host: 192.168.52.11
port: 8080
scheme : http
auth:
username: secret
password: stuff
authScheme: NTLM
realm: realm
hostname: host
domain: WINDOWSDOMAIN
credentialType: NT
nonProxyHosts:
- localhost
- '192.168.52.*'
- '*.example.com'
tls:
protocol: TLSv1.2
provider: SunJSSE
verifyHostname: true
keyStorePath: /path/to/file
keyStorePassword: changeit
keyStoreType: JKS
trustStorePath: /path/to/file
trustStorePassword: changeit
trustStoreType: JKS
trustSelfSignedCertificates: false
supportedProtocols: TLSv1.1,TLSv1.2
supportedCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
certAlias: alias-of-specific-cert
Note
Duration values in the Jersey client configuration blocks are different to Stroom Durations defined in Stroom properties.
They are defined as a numeric value and a unit suffix.
Typical suffixes are (in ascending order): ns, us, ms, s, m, h, d.
ISO 8601 duration strings are NOT supported, nor are values without a suffix.
Full list of duration suffixes and their aliases
Note
The paths used for the key and trust stores will be treated in the same way as Stroom property paths, i.e. relative to stroom.home if relative and supporting variable substitution.
Logging Configuration
The Dropwizard configuration file controls all the logging by the application.
In addition to the main application log, there are additional logs such as stroom user events (for audit), Stroom-Proxy send and receive logs and database migration logs.
For full details of the logging configuration, see
Dropwizard Logging Configuration
Request Log
The request log is slightly different to the other logs.
It logs all requests to the web server.
It is configured in the server section.
The property archivedLogFilenamePattern controls rolling of the active log file.
The date pattern in the filename controls the frequency that the log files are rolled.
In this example, files will be rolled every 1 minute.
server:
requestLog:
appenders:
- type: file
currentLogFilename: logs/access/access.log
discardingThreshold: 0
# Rolled and gzipped every minute
archivedLogFilenamePattern: logs/access/access-%d{yyyy-MM-dd'T'HH:mm}.log.gz
archivedFileCount: 10080
logFormat: '%h %l "%u" [%t] "%r" %s %b "%i{Referer}" "%i{User-Agent}" %D'
Logback Logs
Dropwizard uses
Logback
for application level logging.
All logs in Stroom and Stroom-Proxy apart from the request log are Logback based logs.
Logback uses the concept of Loggers and Appenders.
A Logger is a named thing that produces log messages.
An Appender is an output that a Logger can append its log messages to.
Typical Appenders are:
- File - appends messages to a file that may or may not be rolled.
- Console - appends messages to
stdout.
- Syslog - appends messages to
syslog.
Loggers
A Logger can append to more than one Appender if required.
For example, the default configuration file for Stroom has two appenders for the application logs.
The rolled files from one appender are POSTed to Stroom to index its own logs, then deleted and the other is intended to
remain on the server until archived off to allow viewing by an administrator.
A Logger can be configured with a severity, valid severities are (TRACE, DEBUG, WARN, ERROR).
The severity set on a logger means that only messages with that severity or higher will be logged, with the rest not logged.
Logger names are typically the name of the Java class that is producing the log message.
You don’t need to understand too much about Java classes as you are only likely to change logger severities when requested by one of the developers.
Some loggers, such as event-logger do not have a Java class name.
As an example this is a portion of a Stroom config.yml file to illustrate the different loggers/appenders:
logging:
# This is root logging severity level for all loggers. Only messages >= to WARN will be logged unless overridden
# for a specific logger
level: WARN
# All the named loggers
loggers:
# Logs useful information about stroom. Only set DEBUG on specific 'stroom' classes or packages
# due to the large volume of logs that would be produced for all of 'stroom' in DEBUG.
stroom: INFO
# Logs useful information about dropwizard when booting stroom
io.dropwizard: INFO
# Logs useful information about the jetty server when booting stroom
org.eclipse.jetty: INFO
# Logs REST request/responses with headers/payloads. Set this to OFF to turn disable that logging.
org.glassfish.jersey.logging.LoggingFeature: INFO
# Logs summary information about FlyWay database migrations
org.flywaydb: INFO
# Logger and custom appender for audit logs
event-logger:
level: INFO
# Prevents messages from this logger from being sent to other appenders
additive: false
appenders:
- type: file
currentLogFilename: logs/user/user.log
discardingThreshold: 0
# Rolled every minute
archivedLogFilenamePattern: logs/user/user-%d{yyyy-MM-dd'T'HH:mm}.log
# Minute rolled logs older than a week will be deleted. Note rolled logs are deleted
# based on the age of the window they contain, not the number of them. This value should be greater
# than the maximum time stroom is not producing events for.
archivedFileCount: 10080
logFormat: "%msg%n"
# Logger and custom appender for the flyway DB migration SQL output
org.flywaydb.core.internal.sqlscript:
level: DEBUG
additive: false
appenders:
- type: file
currentLogFilename: logs/migration/migration.log
discardingThreshold: 0
# Rolled every day
archivedLogFilenamePattern: logs/migration/migration-%d{yyyy-MM-dd}.log
archivedFileCount: 10
logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
Appenders
The following is an example of the default appenders that will be used for all loggers unless they have their own custom appender configured.
logging:
# Appenders for all loggers except for where a logger has a custom appender configured
appenders:
# stdout
- type: console
# Multi-coloured log format for console output
logFormat: "%highlight(%-6level) [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%green(%t)] %cyan(%logger) - %X{code} %msg %n"
timeZone: UTC
#
# Minute rolled files for stroom/datafeed, will be curl'd/deleted by stroom-log-sender
- type: file
currentLogFilename: logs/app/app.log
discardingThreshold: 0
# Rolled and gzipped every minute
archivedLogFilenamePattern: logs/app/app-%d{yyyy-MM-dd'T'HH:mm}.log.gz
# One week using minute files
archivedFileCount: 10080
logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
Log Rolling
Rolling of log files can be done based on size of file or time.
The archivedLogFilenamePattern property controls the rolling behaviour.
The rolling policy is determined from the filename pattern, e.g. a pattern with a minute precision date format will be rolled every minute.
The following is an example of an appender that rolls based on the size of the log file:
- type: file
currentLogFilename: logs/app.log
# The name pattern, where i a sequential number indicating age, where 1 is the most recent
archivedLogFilenamePattern: logs/app-%i.log
# The maximum number of rolled files to keep
archivedFileCount: 10
# The maximum size of a log file
maxFileSize: "100MB"
logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
The following is an example of an appender that rolls every minute to gzipped files:
- type: file
currentLogFilename: logs/app/app.log
# Rolled and gzipped every minute
archivedLogFilenamePattern: logs/app/app-%d{yyyy-MM-dd'T'HH:mm}.log.gz
# One week using minute files
archivedFileCount: 10080
logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
Warning
Log file rolling is event based, so a file will only roll when a new message arrives that would require a roll to happen.
This means that if the application is idle for a long period with no log output then the un-rolled file will remain active until a new message arrives to trigger it to roll. For example, if Stroom is unused overnight, then the last log message from the night before will not be rolled until a new messages arrive in the morning.
For this reason, archivedFileCount should be set to a value that is greater than the maximum time the application may be idle, else rolled log files may be deleted as soon as they are rolled.
2 - Stroom Configuration
Describes how the Stroom application is configured.
General configuration
The Stroom application is essentially just an executable
JAR
file that can be run when provided with a configuration file, config.yml.
This config file is common to all forms of deployment.
config.yml
Stroom operates on a configuration by exception basis so all configuration properties will have a sensible default value and a property only needs to be explicitly configured if the default value is not appropriate, e.g. for tuning a large scale production deployment or where values are environment specific.
As a result config.yml only contains a minimal set of properties.
The full tree of properties can be seen in ./config/config-defaults.yml and a schema for the configuration tree (along with descriptions for each property) can be found in ./config/config-schema.yml.
These two files can be used as a reference when configuring stroom.
Key Configuration Properties
The following are key properties that would typically be changed for a production deployment.
All configuration branches are relative to the appConfig root.
The database name(s), hostname(s), port(s), usernames(s) and password(s) should be configured using these properties.
Typically stroom is configured to keep it statistics data in a separate database to the main stroom database, as is configured below.
commonDbDetails:
connection:
jdbcDriverUrl: "jdbc:mysql://localhost:3307/stroom?useUnicode=yes&characterEncoding=UTF-8"
jdbcDriverUsername: "stroomuser"
jdbcDriverPassword: "stroompassword1"
statistics:
sql:
db:
connection:
jdbcDriverUrl: "jdbc:mysql://localhost:3307/stats?useUnicode=yes&characterEncoding=UTF-8"
jdbcDriverUsername: "statsuser"
jdbcDriverPassword: "stroompassword1"
In a clustered deployment each node must be given a node name that is unique within the cluster.
This is used to identify nodes in the Nodes screen.
It could be the hostname of the node or follow some other naming convention.
node:
name: "node1a"
Each node should have its identity on the network configured so that it uses the appropriate FQDNs.
The nodeUri hostname is the FQDN of each node and used by nodes to communicate with each other, therefore it can be private to the cluster of nodes.
The publicUri hostname is the public facing FQDN for stroom, i.e. the address of a load balancer or Nginx.
This is the address that users will use in their browser.
nodeUri:
hostname: "localhost" # e.g. node5.stroomnodes.somedomain
publicUri:
hostname: "localhost" # e.g. stroom.somedomain
Deploying without Docker
Stroom running without docker has two files to configure it.
The following locations are relative to the stroom home directory, i.e. the root of the distribution zip.
./config/config.yml - Stroom configuration YAML file
./config/scripts.env - Stroom scripts configuration env file
The distribution also includes these files which are helpful when it comes to configuring stroom.
./config/config-defaults.yml - Full version of the config.yml file containing all branches/leaves with default values set.
Useful as a reference for the structure and the default values.
./config/config-schema.yml - The schema defining the structure of the config.yml file.
scripts.env
This file is used by the various shell scripts like start.sh, stop.sh, etc.
This file should not need to be changed unless you want to change the locations where certain log files are written to or need to change the java memory settings.
In a production system it is highly likely that you will need to increase the java heap size as the default is only 2G.
The heap size settings and any other java command line options can be set by changing:
JAVA_OPTS="-Xms512m -Xmx2048m"
As part of a docker stack
When stroom is run as part of one of our docker stacks, e.g. stroom_core there are some additional layers of configuration to take into account, but the configuration is still primarily done using the config.yml file.
Stroom’s config.yml file is found in the stack in ./volumes/stroom/config/ and this is the primary means of configuring Stroom.
The stack also ships with a default config.yml file baked into the docker image.
This minimal fallback file (located in /stroom/config-fallback/ inside the container) will be used in the absence of one provided in the docker stack configuration (./volumes/stroom/config/).
The default config.yml file uses environment variable substitution so some configuration items will be set by environment variables set into the container by the stack env file and the docker-compose YAML.
This approach is useful for configuration values that need to be used by multiple containers, e.g. the public FQDN of Nginx, so it can be configured in one place.
If you need to further customise the stroom configuration then it is recommended to edit the ./volumes/stroom/config/config.yml file.
This can either be a simple file with hard coded values or one that uses environment variables for some of its
configuration items.
The configuration works as follows:
env file (stroom<stack name>.env)
|
|
| environment variable substitution
|
v
docker compose YAML (01_stroom.yml)
|
|
| environment variable substitution
|
v
Stroom configuration file (config.yml)
Ansible
If you are using Ansible to deploy a stack then it is recommended that all of stroom’s configuration properties are set directly in the config.yml file using a templated version of the file and to NOT use any environment variable substitution.
When using Ansible, the Ansible inventory is the single source of truth for your configuration so not using environment variable substitution for stroom simplifies the configuration and makes it clearer when looking at deployed configuration files.
Stroom-ansible has an example inventory for a single node stroom stack deployment.
The group_vars/all file shows how values can be set into the env file.
3 - Stroom Proxy Configuration
Describes how the Stroom-Proxy application is configured.
The configuration of Stroom-proxy is very much the same as for Stroom with the only difference being the structure of the application specific part of the config.yml file.
Stroom-proxy has a proxyConfig key in the YAML while Stroom has appConfig.
YAML Configuration File
The Stroom-proxy application is essentially just an executable
JAR
file that can be run when provided with a configuration file, config.yml.
This configuration file is common to all forms of deployment.
As Stroom-proxy does not have a user interface, the config.yml file is the only way of configuring Stroom-Proxy.
As with stroom, the config.yml file is split into three sections using these keys:
-
server - Configuration of the web server, e.g. ports, paths, request logging.
See Server Configuration
-
logging - Configuration of application logging.
See Logging Configuration
-
proxyConfig - Stroom-Proxy specific configuration
See also Properties for more details on structure of the config.yml file and supported data types.
Stroom-Proxy operates on a configuration by exception basis so as far as is possible, all configuration properties will have a sensible default value and a property only needs to be explicitly configured if the default value is not appropriate (e.g. for tuning a large scale production deployment) or where values are environment specific (e.g. the hostname of a forward destination).
As a result the config.yml shipped with Stroom Proxy only contains a minimal set of properties.
The full tree of properties can be seen in ./config/config-defaults.yml and a schema for the configuration tree (along with descriptions for each property) can be found in ./config/config-schema.yml.
These two files can be used as a reference when configuring stroom.
In the snippets of YAML configuration below, the default sections
Basic Structure
Stroom-Proxy has a number of key functions which are all configured via its YAML configuration file.
The following YAML shows the high level structure of the Stroom-Proxy configuration file.
Each branch of the this YAML is explained in more detail below.
proxyConfig:
# This should be set to a value that is unique within your Stroom/Stroom-Proxy estate.
# It is used in the unique ReceiptId that is set in the meta of received data so
# provides provenence of where data was received at each stage.
proxyId: null
# If true, Stroom-Proxy will halt on start up if any errors are found in the YAML
# configuration file. If false, the errors will simply be logged. Setting this to
# false is not advised
haltBootOnConfigValidationFailure: true
# Configuration of the base and temp paths used by Stroom-Proxy.
# See Path Configuration below
path:
# This is the downstream (in flow of stream data terms) Stroom/Stroom-Proxy instance/cluster
# used for feed status checks, supplying data receipt rules and verifying API keys.
downstreamHost:
# This controls the aggregation of received data into larger chunks prior to forwarding.
# This is typically required to prevent Stroom receiving lots of small streams.
aggregator:
# If receive.receiptCheckMode is FEED_STATUS, this controls the feed status
# checking. See Feed Status Configuration below.
feedStatus:
# Zero to many HTTP POST based destinations.
# E.g. for forwarding to Stroom or another Stroom-Proxy
forwardHttpDestinations:
# Zero to many file system based destinations. See Forward Configuration below.
forwardFileDestinations:
# This controls the meta entries that will be included in the send and receive logs.
logStream:
# If receive.receiptCheckMode is RECEIPT_POLICY, this controls the fetching
# of the policy rules.
receiptPolicy:
# This section is common to both Stroom and Stroom-Proxy
# See Receive Configuration below.
receive:
# Configuration for authentication. See Security Configuration below.
security:
Stroom-proxy should be configured to check the receipt status of feeds on receipt of data.
This is done by configuring the end point of a downstream stroom-proxy or stroom.
feedStatus:
url: "http://stroom:8080/api/feedStatus/v1"
apiKey: ""
The url should be the url for the feed status API on the downstream stroom(-proxy).
If this is on the same host then you can use the http endpoint, however if it is on a remote host then you should use https and the host of its nginx, e.g. https://downstream-instance/api/feedStatus/v1.
In order to use the API, the proxy must have a configured apiKey.
The API key must be created in the downstream stroom instance and then copied into this configuration.
If the proxy is configured to forward data then the forward destination(s) should be set.
This is the datafeed endpoint of the downstream stroom-proxy or stroom instance that data will be forwarded to.
This may also be the address of a load balancer or similar that is fronting a cluster of stroom-proxy or stroom instances.
See also Feed status certificate configuration.
forwardHttpDestinations:
- enabled: true
name: "downstream"
forwardUrl: "https://some-host/stroom/datafeed"
forwardUrl specifies the URL of the datafeed endpoint on the destination host.
Each forward location can use a different key/trust store pair.
See also Forwarding certificate configuration.
If the proxy is configured to store then the location of the proxy repository may need to be configured if it needs to be in a different location to the proxy home directory, e.g. on another mount point.
Aggregator Configuration
proxyConfig:
aggregator:
enabled: true
# Whether to split received ZIPs if they are too large.
splitSources: true
# Maximum number of items to include in an aggregate
maxItemsPerAggregate: 1000
# Maximum size of the aggregate in uncompressed bytes.
# Aggregates may be larger than this is splitSources is false or single very
# large streams are received.
maxUncompressedByteSize: "1G"
#The the length of time that data is added to an aggregate for before the aggregate is closed.
aggregationFrequency: "PT10M"
Note
The aggregator settings apply to all forwarders.
It is not possible for forwarders to to use different aggregation settings.
If you need to forward to a HTTP destination but also want to forward to a file destination using different aggregator settings, e.g. to keep a local archive of the data, you would need to employ a second Stroom-Proxy.
Stroom-Proxy A would forward to the HTTP downstream and forward to Stroom-Proxy B over HTTP.
Stroom-Proxy B would forward to a file destination, using much larger aggregator thresholds.
Directory Scanner Configuration
This configuration controls the directories that Stroom-Proxy scans to look for ZIP files to ingest.
It is primarily used as a means of manually re-processing files that have failed to forward, either as a result of too many retries or due to an unrecoverable error.
proxyConfig:
dirScanner:
# One or more directories to scan.
# If the path is relative it is treated as relative to the proxyConfig.path.home property.
dirs:
- "zip_file_ingest"
# Whether directory scanning is enabled or not
enabled: true
# The directory to move any failed files to.
# If the path is relative it is treated as relative to the proxyConfig.path.home property.
failureDir: "zip_file_ingest_failed"
# How frequently each directory is scanned for files.
scanFrequency: "PT1M"
Downstream Host Configuration
This is the default downstream (in flow of stream data terms) Stroom/Stroom-Proxy instance/cluster used for feed status checks, supplying data receipt rules and verifying API keys.
By default it will be used as the default
proxyConfig:
downstreamHost:
# http or https
scheme: "https"
# If not set, will default to 80/443 depending on scheme
port: 443
hostname: "...STROOM-PROXY OR STROOM FQDN..."
# If not using OpenID authentication you will need to provide an API key.
apiKey: "sak_6a011e3e5d_oKimmDxfNwj......<truncated>.....HYQxHaR2"
Event Store Configuration
The Event Store is used to store and aggregate individual events received via the /api/event
API
or the SQS Connectors.
Events are appended to files specific to the Feed and Stream Type of the event.
Once a threshold is reached, the file will be rolled and processed by Stroom-Proxy.
Each event is stored as a JSON line in the file.
proxyConfig:
eventStore:
# The size of an internal queue used to buffer aggregates that are ready to process.
forwardQueueSize: 1000
# The maximum age of the file before it is rolled.
maxAge: "PT1M"
# The maximum size of the file before it is rolled.
maxByteCount: 9223372036854775807
# The maximum number of events in the file before it is rolled.
maxEventCount: 9223372036854775807
# Configuration of the cache used for the event store.
openFilesCache:
# The frequency at which files are checked to see if they need to be rolled or not.
rollFrequency: "PT10S"
Feed Status Configuration
The configuration for performing feed status checks.
This section is only relevant if proxyConfig.receive.receiptCheckMode is set to FEED_STATUS.
proxyConfig:
feedStatus:
# Standard cache configuration block for configuring the cache of feed status check outcomes
feedStatusCache:
# The full URL to use for feed status checking.
# ONLY set this if using a non-standard URL, otherwise
# it will be derived from the downstreamHost.
url: null
The configuration of the client certificates for feed status checks is done using the DOWNSTREAM jersey client configuration.
See Stroom and Stroom-Proxy Common Configuration.
Forward Configuration
Stroom-Proxy has two configuration branches for controlling forwarding as each has a different structure.
proxyConfig:
# Zero to many HTTP POST based destinations.
forwardHttpDestinations:
# Zero to many file system based destinations.
forwardFileDestinations:
Both types of forwarder have an enabled property.
If a forwarder’s enabled state is set to false it is as if the forwarder configuration does not exist, i.e no data will be queued for that forwarder until its state is changed to true.
File Forward Destinations Configuration
proxyConfig:
# Zero to many file system based destinations.
forwardFileDestinations:
# Stroom-Proxy will attempt to move files onto the forward destination using an atomic move.
# This ensures that the move does not happen more than once. If an atomic move is not possible,
# e.g. the destination is a remote file system that does not support an atomic move, then it will
# fall back to a non-atomic move with the risk of it happening more than once. If you see warnings
# in the logs or know the file system will not support atomic moves then set this to false
- atomicMoveEnabled: true
# Whether this destination is enabled or not.
enabled: true
# If Instant Forwarding is to be used.
instant: false
# The type of liveness check to perform:
# READ - will attempt to read the file/dir specified in livenessCheckPath.
# WRITE - will attempt to touch the file specified in livenessCheckPath.
livenessCheckMode: "READ"
# The path to use for regular liveness checking of this forward destination.
# If null, empty or if the 'queue' property is not configured, then no liveness check
# will be performed and the destination will be
# assumed to be healthy. If livenessCheckMode is READ, livenessCheckPath can be a
# directory or a file and stroom-proxy will attempt to check it can read the
# file/directory. If livenessCheckMode is WRITE, then livenessCheckPath must be a
# file and stroom-proxy will attempt to touch that file. It is
# only recommended to set this property for a remote file system where
# connection issues may be likely. If it is a relative path, it will be assumed
# to be relative to 'path'
livenessCheckPath: null
# The unique name of the destination (across all file/http forward destinations.
# The name is used in the directories on the file system, so do not change the name
# once proxy has processed data. Must be provided.
name: "...PROVIDE FORWARDER NAME..."
# The base path of a directory to forward to.
path: "...PROVIDE PATH..."
# See Queue Configuration section below
queue:
# The templated relative sub-path of path.
# The default path template is '${year}${month}${day}/${feed}'
# Cannot be an absolute path and must resolve to a descendant of path.
# Fore details of this configuration branch, see Path Templating Configuration below.
subPathTemplate: null
HTTP Forward Destinations Configuration
proxyConfig:
# Zero to many HTTP POST based destinations.
forwardHttpDestinations:
# If true, add Open ID authentication headers to the request. Only works if the identityProviderType
# is EXTERNAL_IDP and the destination is in the same Open ID Connect realm as the OIDC client that this
# proxy instance is using.
- addOpenIdAccessToken: false
# The API key to use when forwarding data if Stroom is configured to require an API key.
# Does NOT use the API Key from downstreamHost config.
apiKey: null
# Whether this destination is enabled or not.
enabled: true
forwardHeadersAdditionalAllowSet: []
# The full URL to forward to if different from <downstreamHost>/datafeed
forwardUrl: null
# Configuration of the HTTP client, see below.
httpClient:
# If Instant Forwarding is to be used.
instant: false
# Whether liveness checking of the HTTP destination will take place. The queue property
# must also be configured for liveness checking to happen
livenessCheckEnabled: true
# The URL/path to check for liveness of the forward destination. The URL should return a 200 response
# to a GET request for the destination to be considered live.
# If the response from the liveness check is not a 200, forwarding
# will be paused at least until the next liveness check is performed.
# If this property is not set, the downstreamHost configuration will be combined with the default API
# path (/status).
# If this property is just a path, it will be combined with the downstreamHost configuration.
# Only set this property if you wish to use a non-default path.
# or you want to use a different host/port/scheme to that defined in downstreamHost
livenessCheckUrl: null
# The unique name of the destination (across all file/http forward destinations.
# The name is used in the directories on the file system, so do not change the name
# once proxy has processed data. Must be provided.
name: "...PROVIDE FORWARDER NAME..."
# See Queue Configuration section below
queue:
Queue Configuration
Each forward destination (whether file or HTTP) has a queue configuration property that controls various aspects of forwarding, e.g. failure handling, delays, concurrency, etc.
forwardHttpDestinations / forwardFileDestinations:
queue:
# The sub-path template to use for data that could not be retried
# or has reached a retry limit.
errorSubPathTemplate:
enabled: true
pathTemplate: "${year}${month}${day}/${feed}"
templatingMode: "REPLACE_UNKNOWN_PARAMS"
# A delay to add before forwarding. Primarily for testing.
forwardDelay: "PT0S"
# Number of threads to process retries
forwardRetryThreadCount: 1
# Number of threads to handle forwarding
forwardThreadCount: 5
# Duration between liveness checks
livenessCheckInterval: "PT1M"
# The maximum time from the first failed forward attempt to continue retrying.
# After this the data will be move to the failure directory permenantly.
maxRetryAge: "P7D"
# The maximum time between retries. Must be greater than or equal to retryDelay.
maxRetryDelay: "P1D"
# If false forwards will be attempted imediately and any failure will restult in the
# data being moved to the failure directory.
queueAndRetryEnabled: false
# The time between retries. If retryDelayGrowthFactor is >1, this value will grow
# after each retry.
retryDelay: "PT10M"
# The factor to apply to retryDelay after each failed retry.
retryDelayGrowthFactor: 1.0
Path Templating Configuration
The following properties all share the same structure:
proxyConfig.forwardFileDestinations.[n].subPathTemplate
proxyConfig.forwardFileDestinations.[n].queue.errorSubPathTemplate
proxyConfig.forwardHttpDestinations.[n].queue.errorSubPathTemplate
xxxxxxTemplate:
# Whether templating is enabled or not. If not enabled
# no sub-path will be used.
enabled: true
# The template to use for the sub-path
pathTemplate: "${year}${month}${day}/${feed}"
# Controls how unknown parameters are dealt with. One of:
# IGNORE_UNKNOWN_PARAMS - e.g. 'cat/${unknownparam}/dog' => 'cat/${unknownparam}/dog'
# REMOVE_UNKNOWN_PARAMS - e.g. 'cat/${unknownparam}/dog' => 'cat/dog'
# REPLACE_UNKNOWN_PARAMS - Replace unknown with 'XXX', e.g. 'cat/${unknownparam}/dog' => 'cat/XXX/dog'
templatingMode: "REPLACE_UNKNOWN_PARAMS"
The following template parameters are supported:
${feed} - The Feed name.
${type} - The Stream Type.
${year} - The 4 digit year of the current date/time.
${month} - The 2 digit month of the current date/time.
${day} - The 2 digit day of the current date/time.
${hour} - The 2 digit hour of the current date/time.
${minute} - The 2 digit minute of the current date/time.
${second} - The 2 digit second of the current date/time.
${millis} - The 3 digit milliseconds of the current date/time.
${ms} - The current date/time as milliseconds since the Unix Epoch.
Liveness Checking
Each of the configured forward destinations has a liveness check that can be configured.
This allows Stroom Proxy to periodically check that the destination is live.
If the liveness check fails for a destination, all forwarding for that destination will be paused until a subsequent liveness check reports it as live again.
The liveness checks take the following forms:
-
HTTP Destination - Performs a GET request to the URL configured using forwardHttpDestinations.[n].livenessCheckUrl.
If not configured it will use /status on the downstream host.
The destination is considered live if it gets a 200 response.
You can use a URL that allows the destination to control its liveness, i.e. to take itself off line during an upgrade.
-
File Destination - Reads or writes (touch) to a file defined by forwardFileDestinations.[n].livenessCheckPath.
Liveness checking for a file destination may be useful if the destination is on a network file share.
livenessCheckMode controls whether a read or write to the file is performed.
HTTP Client Configuration
proxyConfig:
forwardHttpDestinations:
httpClient:
connectionRequestTimeout: "PT3M"
connectionTimeout: "PT3M"
cookiesEnabled: false
keepAlive: "PT0S"
maxConnections: 1024
maxConnectionsPerRoute: 1024
proxy: null
retries: 0
timeToLive: "PT1H"
timeout: "PT3M"
# Transport Layer Security, see below.
tls: null
userAgent: null
validateAfterInactivityPeriod: "PT0S"
The tls branch of the configuration is for configuring Transport Layer Security (the successor to Secure Sockets Layer (SSL)).
It is null by default, i.e. no additional TLS configuration is used.
Its structure is:
proxyConfig:
forwardHttpDestinations:
httpClient:
tls:
protocol: "TLSv1.2"
# The name of the JCE provider to use on client side for cryptographic support
# (for example, SunJCE, Conscrypt, BC, etc). See Oracle documentation for more information.
provider:
# The path of the key store file
keyStorePath: null
# The password of the key store file
keyStorePassword: null
# The type of key store (usually JKS, PKCS12, JCEKS, Windows-MY, or Windows-ROOT).
keyStoreType: "JKS"
keyStoreProvider: null
# The path of the trust store file
trustStorePath: null
# The password of the trust store file
trustStorePassword: null
# The type of trust store (usually JKS, PKCS12, JCEKS, Windows-MY, or Windows-ROOT).
trustStoreType: "JKS"
trustStoreProvider: null
trustSelfSignedCertificates: false
verifyHostname: false
# Zero to protocols (e.g., SSLv3, TLSv1) which are supported.
# All other protocols will be refused.
supportedProtocols: null
# A list of cipher suites (e.g., TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256) which are supported.
# All other cipher suites will be refused.
supportedCiphers: null
certAlias: null
Log Stream Configuration
This controls the meta entries that will be included in the send and receive logs.
proxyConfig:
logStream:
# The headers attributes that will be output in the send/receive log lines.
# They will be output in the order that they appear in this list.
# Duplicates will be ignored, case does not matter.
metaKeys:
- "guid"
- "receiptid"
- "feed"
- "system"
- "environment"
- "remotehost"
- "remoteaddress"
- "remotedn"
- "remotecertexpiry"
Path Configuration
proxyConfig:
path:
# By default all files read or written to by stroom-proxy will be in directories relative to
# the home location. Ideally this should differ from the location of the Stroom Proxy
# installed software as it has a different lifecycle.
# If not set the location of the Stroom-Proxy application JAR file will be used and if that
# can't be determined, <user's home>/.stroom will be used.
home: "...SET TO AN ABSOLUTE PATH..."
# The location for Stroom-Proxy's persisted data
data: "data"
# The location for any temporary files/directories.
# If not set, will use a sub-directory called 'stroom-proxy' in the system temp dir,
# i.e. as defined by 'java.io.tmpdir'.
temp: null
All paths in the configuration file can be either relative or absolute.
If relative then they will be treated as being relative to the home path.
Receipt Policy Configuration
This section of configuration is only applicable if proxyConfig.receive.receiptCheckMode is RECEIPT_POLICY.
It controls the fetching of the receipt policy rules from a downstream Stroom or Stroom-Proxy.
proxyConfig:
receiptPolicy:
# Only set if using a non-standard URL, else this is derived based on downstreamHost
# config.
receiveDataRulesUrl: null
# The duration between calls to fetch the latest policy rules.
syncFrequency: "PT1M"
The configuration of the client certificates for receipt policy checks is done using the DOWNSTREAM jersey client configuration.
See Stroom and Stroom-Proxy Common Configuration.
Receive Configuration
The receive configuration is common to both Stroom and Stroom-Proxy, see Receive Configuration
Security Configuration
proxyConfig:
security:
authentication:
# This property is currently not used
authenticationRequired: true
# Open ID Connect configuration
openId:
The openId branch of the config is common to both Stroom and Stroom-Proxy, see Open ID Configuration for details.
Amazon Simple Queue Service Configuration
Stroom-Proxy is able to consume messages from multiple AWS SQS queues.
Each message received from a queue will be added to the Event Store for aggregation by Feed and Stream Type.
proxyConfig:
# Zero to many connectors
sqsConnectors:
# This property is not currently used
- awsProfileName: null
# The name of the AWS region the SQS queue exists in.
awsRegionName: "...AWS REGION..."
# The maximum time to wait when polling the queue for messages
pollFrequency: "PT10S"
# This property is not currently used
queueName: null
# The URL of the Amazon SQS queue from which messages are received.
queueUrl: "...SQS QUEUE URL..."
Thread Configuration
Stroom-Proxy is able to run certain operations in parallel.
This configuration allows you to increase the number of threads used for each operation.
proxyConfig:
threads:
# Number of threads to consume from the aggregate input queue.
aggregateInputQueueThreadCount: 1
# Number of threads to consume from the forwarding input queue.
forwardingInputQueueThreadCount: 1
# Number of threads to consume from the pre-aggregate input queue.
preAggregateInputQueueThreadCount: 1
# Number of threads to consume from the zip splitting input queue.
zipSplittingInputQueueThreadCount: 1
Deploying without Docker
Apart from the structure of the config.yml file, the configuration in a non-docker environment is the same as for stroom.
As part of a docker stack
The way Stroom-Proxy is configured is essentially the same as for stroom with the only real difference being the structure of the config.yml file as note above .
As with stroom the docker stack comes with a ./volumes/stroom-proxy-*/config/config.yml file that will be used in the absence of a provided one.
Also as with stroom, the config.yml file supports environment variable substitution so can make use of environment variables set in the stack .env file and passed down via the docker-compose YAML files.
Certificates
Stroom-proxy makes use of client certificates for two purposes:
- Communicating with a downstream stroom/stroom-proxy in order to establish the receipt status for the feeds it has received data for.
- When forwarding data to a downstream stroom/stroom-proxy
The stack comes with the following files that can be used for demo/test purposes.
volumes/stroom-proxy-*/certs/ca.jks
volumes/stroom-proxy-*/certs/client.jks
For a production deployment these will need to be replaced with the certificates that are appropriate for your environment.
Typical Configuration
The following are a guide to typical configurations for operating a Stroom-Proxy with different use cases.
Store and Forward
This is a typical case where you want to aggregate received data then forward it to a downstream Stroom or Stroom-Proxy, but also retain a store of the aggregates.
server:
applicationContextPath: /
adminContextPath: /proxyAdmin
applicationConnectors:
- type: http
port: "8090"
useForwardedHeaders: true
adminConnectors:
- type: http
port: "8091"
useForwardedHeaders: true
detailedJsonProcessingExceptionMapper: true
requestLog:
appenders:
# Log appender for the web server request logging
- type: file
currentLogFilename: logs/access/access.log
discardingThreshold: 0
# Rolled and gzipped every minute
archivedLogFilenamePattern: logs/access/access-%d{yyyy-MM-dd'T'HH:mm}.log.gz
# One week using minute files
archivedFileCount: 10080
logFormat: '%h %l "%u" [%t] "%r" %s %b "%i{Referer}" "%i{User-Agent}" %D'
logging:
level: WARN
loggers:
# Logs useful information about stroom proxy. Only set DEBUG on specific 'stroom' classes or packages
# due to the large volume of logs that would be produced for all of 'stroom' in DEBUG.
stroom: INFO
# Logs useful information about dropwizard when booting stroom
io.dropwizard: INFO
# Logs useful information about the jetty server when booting stroom
# Set this to INFO if you want to log all REST request/responses with headers/payloads.
org.glassfish.jersey.logging.LoggingFeature: OFF
# Logger and appender for proxy receipt audit logs
"receive":
level: INFO
additive: false
appenders:
- type: file
currentLogFilename: logs/receive/receive.log
discardingThreshold: 0
# Rolled and gzipped every minute
archivedLogFilenamePattern: logs/receive/receive-%d{yyyy-MM-dd'T'HH:mm}.log.gz
# One week using minute files
archivedFileCount: 10080
logFormat: "%-6level [%d{yyyy-MM-dd'T'HH:mm:ss.SSS'Z'}] [%t] %logger - %X{code} %msg %n"
# Logger and appender for proxy send audit logs
"send":
level: INFO
additive: false
appenders:
- type: file
currentLogFilename: logs/send/send.log
discardingThreshold: 0
# Rolled and gzipped every minute
archivedLogFilenamePattern: logs/send/send-%d{yyyy-MM-dd'T'HH:mm}.log.gz
# One week using minute files
archivedFileCount: 10080
logFormat: "%-6level [%d{yyyy-MM-dd'T'HH:mm:ss.SSS'Z'}] [%t] %logger - %X{code} %msg %n"
appenders:
# Log to stdout, use this if running in Docker
- type: console
# Multi-coloured log format for console output
logFormat: "%highlight(%-6level) [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%green(%t)] %cyan(%logger) - %X{code} %msg %n"
timeZone: UTC
# Minute rolled files for stroom/datafeed, will be curl'd/deleted by stroom-log-sender
- type: file
currentLogFilename: logs/app/app.log
discardingThreshold: 0
archivedLogFilenamePattern: logs/app/app-%d{yyyy-MM-dd'T'HH:mm}.log.gz
# One week using minute files
archivedFileCount: 10080
logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
# This section contains the Stroom Proxy configuration properties
# For more information see:
# https://gchq.github.io/stroom-docs/user-guide/properties.html
# jerseyClients are used for making feed status and content sync REST calls
jerseyClients:
default:
tls:
keyStorePath: "certs/client.jks"
keyStorePassword: "password"
trustStorePath: "certs/ca.jks"
trustStorePassword: "password"
proxyConfig:
path:
# By default all files read or written to by stroom-proxy will be in directories relative to
# the home location. This must be set to an absolute path and also to one that differs
# the installed software as it has a different lifecycle.
home: "/stroomdata/stroom-proxy/home"
# This is the downstream (in datafeed flow terms) stroom/stroom-proxy used for
# feed status checks, supplying data receipt rules and verifying API keys.
downstreamHost:
scheme: "https"
port: "443"
hostname: "stroom.some.domain"
apiKey: "...API KEY..."
aggregator:
maxItemsPerAggregate: 1000
maxUncompressedByteSize: "1G"
aggregationFrequency: 10m
forwardFileDestinations:
- name: "archive-repo"
path: "/stroomdata/stroom-proxy/archive-repo"
subPathTemplate:
pathTemplate: "${year}/${year}-${month}/${year}-${month}-${day}/${year}-${month}-${day}-${feed}/"
forwardHttpDestinations:
- name: "downstream-stroom"
httpClient:
tls:
keyStorePath: "certs/client.jks"
keyStorePassword: "password"
trustStorePath: "certs/ca.jks"
trustStorePassword: "password"
receive:
receiptCheckMode: "RECEIPT_POLICY"
Air-Gapped Store Only
This is an example of a Stroom-Proxy instance that is hosted in an environment where is has no direct link to a downstream Stroom/Stroom-Proxy.
All data is aggregated and forwarded to the local file system for transport downstream using other means outside of the scope of this documentation.
server:
# ... Same as configuration above
logging:
# ... Same as configuration above
jerseyClients:
# ... Same as configuration above
proxyConfig:
path:
# By default all files read or written to by stroom-proxy will be in directories relative to
# the home location. This must be set to an absolute path and also to one that differs
# the installed software as it has a different lifecycle.
home: "/stroomdata/stroom-proxy/home"
# No downstreamHost due to air-gap
downstreamHost:
enabled: false
aggregator:
maxItemsPerAggregate: 1000
maxUncompressedByteSize: "1G"
aggregationFrequency: 10m
forwardFileDestinations:
# Repo for a local archive
- name: "archive-repo"
path: "/stroomdata/stroom-proxy/archive-repo"
subPathTemplate:
pathTemplate: "${year}/${year}-${month}/${year}-${month}-${day}/${year}-${month}-${day}-${feed}/"
# Repo to be transported downstream around air-gap
- name: "downstream-repo"
path: "/stroomdata/stroom-proxy/downstream-repo"
subPathTemplate:
pathTemplate: "${year}/${year}-${month}/${year}-${month}-${day}/${year}-${month}-${day}-${feed}/"
forwardHttpDestinations: []
receive:
# No receipt checking due to air-gap. All data accepted.
receiptCheckMode: "RECEIVE_ALL"