This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Stroom and Stroom-Proxy Configuration

How to configure Stroom and Stroom-Proxy.

1: Common Configuration
2: Stroom Configuration
3: Stroom Proxy Configuration

The Stroom and Stroom-Proxy applications are built on the same Dropwizard framework so have a lot of similarities when it comes to configuration.

The Stroom/Stroom-Proxy applications are essentially just an executable JAR file that can be run when provided with a configuration file, config.yml. This config file is common to all forms of deployment.

1 - Common Configuration

Configuration common to Stroom and Stroom-Proxy.

This YAML file, sometimes known as the Dropwizard configuration file (as it conforms to a structure defined by Dropwizard) is the primary means of configuring Stroom/Stroom-Proxy. As a minimum this file should be used to configure anything that needs to be set before stroom can start up, e.g. web server, logging, database connection details, etc. It is also used to configure anything that is specific to a node in a stroom cluster.

If you are using some form of scripted deployment, e.g. ansible then it can be used to set all stroom properties for the environment that stroom runs in. If you are not using scripted deployments then you can maintain stroom’s node agnostic configuration properties via the user interface.

Config File Structure

This file contains both the Dropwizard configuration settings (settings for ports, paths and application logging) and the Stroom/Stroom-Proxy application specific properties configuration. The file is in YAML format and the application properties are located under the appConfig key. For details of the Dropwizard configuration structure, see here .

The file is split into sections using these keys:

server - Configuration of the web server, e.g. ports, paths, request logging.
logging - Configuration of application logging
jerseyClients - Configuration of the various Jersey HTTP clients in use. See Jersey HTTP Client Configuration.
Application specific configuration:
- appConfig - The Stroom configuration properties. These properties can be viewed/modified in the user interface.
- proxyConfig - The Stroom-Proxy configuration properties. These properties can be viewed/modified in the user interface.

The following is an example of the YAML configuration file for Stroom:

# Dropwizard configuration section
server:
  # e.g. ports and paths
logging:
  # e.g. logging levels/appenders

jerseyClients:
  DEFAULT:
    # Configuration of the named client

# Stroom properties configuration section
appConfig:
  commonDbDetails:
    connection:
      jdbcDriverClassName: ${STROOM_JDBC_DRIVER_CLASS_NAME:-com.mysql.cj.jdbc.Driver}
      jdbcDriverUrl: ${STROOM_JDBC_DRIVER_URL:-jdbc:mysql://localhost:3307/stroom?useUnicode=yes&characterEncoding=UTF-8}
      jdbcDriverUsername: ${STROOM_JDBC_DRIVER_USERNAME:-stroomuser}
      jdbcDriverPassword: ${STROOM_JDBC_DRIVER_PASSWORD:-stroompassword1}
  contentPackImport:
    enabled: true
  ...

The following is an example of the YAML configuration file for Stroom-Proxy:

# Dropwizard configuration section
server:
  # e.g. ports and paths
logging:
  # e.g. logging levels/appenders

jerseyClients:
  DEFAULT:
    # Configuration of the named client

# Stroom properties configuration section
proxyConfig:
  path:
    home: /some/path
  ...

`appConfig` Section

The appConfig section is special as it maps to the Properties seen in the Stroom user interface so values can be managed in the file or via the Properties screen in the Stroom UI. The other sections of the file can only be managed via the YAML file. In the Stroom user interface, properties are named with a dot notation key, e.g. stroom.contentPackImport.enabled. Each part of the dot notation property name represents a key in the YAML file, e.g. for this example, the location in the YAML would be:

appConfig:
  contentPackImport:
    enabled: true   # stroom.contentPackImport.enabled

The stroom part of the dot notation name is replaced with appConfig.

For more details on the link between this YAML file and Stroom Properties, see Properties

Variable Substitution

The YAML configuration file supports Bash style variable substitution in the form of:

${ENV_VAR_NAME:-value_if_not_set}

This allows values to be set either directly in the file or via an environment variable, e.g.

      jdbcDriverClassName: ${STROOM_JDBC_DRIVER_CLASS_NAME:-com.mysql.cj.jdbc.Driver}

In the above example, if the STROOM_JDBC_DRIVER_CLASS_NAME environment variable is not set then the value com.mysql.cj.jdbc.Driver will be used instead.

Typed Values

YAML supports typed values rather than just strings, see https://yaml.org/refcard.html. YAML understands booleans, strings, integers, floating point numbers, as well as sequences/lists and maps. Some properties will be represented differently in the user interface to the YAML file. This is due to how values are stored in the database and how the current user interface works. This will likely be improved in future versions. For details of how different types are represented in the YAML and the UI, see Data Types.

Server configuration

The server section controls the configuration of the Jetty web server.

For full details of how to configure the server section see:

The following is an example of the configuration for an application listening on HTTP.

server:
  # The base path for the main application and its API
  applicationContextPath: "/"
  # The base path for the admininstration pages/API
  # For Stroom-Proxy the default is /proxyAdmin
  adminContextPath: "/stroomAdmin"

  # The scheme/port for the main application and its API
  applicationConnectors:
    - type: http
      # For Stroom-Proxy the default is 8090
      port: 8080
      # Uses X-Forwarded-*** headers in request log instead of proxy server details.
      useForwardedHeaders: true

  # The scheme/port for the admininstration pages/API
  adminConnectors:
    - type: http
      # For Stroom-Proxy the default is 8091
      port: 8081
      useForwardedHeaders: true

Common Application Configuration

This section details configuration that is common in both the Stroom appConfig and Stroom-Proxy proxyConfig sections.

Receive Configuration

Configuration for controlling the receipt of data into Stroom and Stroom-Proxy through the /datafeed API.

appConfig / proxyConfig:
  receive:
    # An allow-list containing IP addresses or fully qualified host names to verify that the direct sender
    # of a request (e.g. a load balancer or reverse proxy) is trusted to supply certificate/DN headers
    # as configured with 'x509CertificateHeader' and 'x509CertificateDnHeader'.
    # If this list is null/empty then no check will be made on the client's address.
    allowedCertificateProviders: []
    # Standard cache configuration block for the cache of authenticated Datafeed Keys.
    # This cache is used to avoid having to re-verify every data feed key.
    authenticatedDataFeedKeyCache:
    # If true, the sender will be authenticated using a certificate or token depending on the
    # state of tokenAuthenticationEnabled and certificateAuthenticationEnabled. If the sender
    # can't be authenticated an error will be returned to the client
    # If false, then authentication will be performed if a token/key/certificate
    # is present, otherwise data will be accepted without a sender identity
    authenticationRequired: true
    # The meta key that is used to identify the owner of a Data Feed Key. This
    # may be an AccountId or similar. It must be provided as a header when sending data
    # using the associated Data Feed Key, and its value will be checked against the value
    # held with the hashed Data Feed Key by Stroom. Default value is 'AccountId'.
    # Case does not matter
    dataFeedKeyOwnerMetaKey: "AccountId"
    # The directory where Stroom will look for datafeed key files.
    # Only used if datafeedKeyAuthenticationEnabled is true
    # If the value is a relative path then it will be treated as being
    # relative to stroom.path.home. Data feed key files must have the extension .json.
    # Files in sub-directory will be ignored.
    dataFeedKeysDir: "data_feed_keys"
    # The types of authentication that are enabled for data receipt.
    # One or more of 
    # TOKEN - A Stroom API Key or an OAuth token in the 'Authorization' header
    # CERTIFICATE - An X509 certificate on the request or a DN in the header configured
    #               by .receive.x509CertificateDnHeader
    # DATA_FEED_KEY - A Stroom Data Feed Key in the 'Authorization' header
    enabledAuthenticationTypes:
    - "TOKEN"
    - "CERTIFICATE"
    # If receiptCheckMode is RECEIPT_POLICY or FEED_STATUS and stroom/proxy is
    # unable to perform the receipt check, then this action will be used as a fallback
    # until the receipt check can be successfully performed
    fallbackReceiveAction: "RECEIVE"
    # If true the client is not required to set the 'Feed' header. If Feed is not present
    # a feed name will be generated based on the template specified by the
    # 'feedNameTemplate' property. If false (the default), a populated 'Feed'
    # header will be required
    feedNameGenerationEnabled: false
    # The set of header keys are mandatory if feedNameGenerationEnabled is set to true.
    # Should be set to complement the header keys used in 'feedNameTemplate', but may be a
    # sub-set of those in the template to allow for optional headers
    feedNameGenerationMandatoryHeaders:
    - "AccountId"
    - "Component"
    - "Format"
    - "Schema"
    # A template for generating a feed name from a set of headers. The value of
    # each header referenced in the template will have any unsuitable characters
    # replaced with '_'.
    # If this property is set in the YAML file, use single quotes to prevent the
    # variables being expanded when the config file is loaded
    feedNameTemplate: "${accountid}-${component}-${format}-${schema}"
    # If defined then states the maximum size of a request (uncompressed for gzip requests).
    # Will return a 413 Content Too Long response code for any requests exceeding this
    # value. If undefined then there is no limit to the size of the request.
    maxRequestSize: null
    # Set of supported meta type names. This set must contain all of the names
    # in the default value for this property but can contain additional names.
    metaTypes:
    - "Context"
    - "Detections"
    - "Error"
    - "Events"
    - "Meta Data"
    - "Raw Events"
    - "Raw Reference"
    - "Records"
    - "Reference"
    - "Test Events"
    - "Test Reference"
    # Controls how or whether data is checked on receipt. Valid values
    # (FEED_STATUS|RECEIPT_POLICY|RECEIVE_ALL|REJECT_ALL|DROP_ALL)
    receiptCheckMode: "FEED_STATUS"
    # The format of the Distinguished Name used in the certificate. Valid values are
    # LDAP and OPEN_SSL, where LDAP is the default
    x509CertificateDnFormat: "LDAP"
    # The HTTP header key used to extract the distinguished name (DN) as obtained from an X509 certificate.
    # This is used when a load balancer does the SSL/mTLS termination and passes the client DN though
    # in a header. Only used for
    # authentication if a value is set and 'enabledAuthenticationTypes' includes CERTIFICATE
    x509CertificateDnHeader: "X-SSL-CLIENT-S-DN"
    # The HTTP header key used to extract an X509 certificate. This is used when a load balancer does the
    # SSL/mTLS termination and passes the client certificate though in a header. Only used for
    # authentication if a value is set and 'enabledAuthenticationTypes' includes CERTIFICATE
    x509CertificateHeader: "X-SSL-CERT"

Cache Configuration

Multiple configuration branches in both Stroom and Stroom-Proxy have one or more properties for configuring a cache. Each of these share the same structure and will typically be named xxxCache, e.g. feedStatusCache or metaTypeCache.

Warning

The default values for each property within the cache config will be specific to the cache. Care needs to be taken when changing the cache properties to avoid changing the behaviour of the cache, e.g. changing from having a expireAfterWrite value to having a expireAfterAccess value may prevent items from aging off as expected.

      xxxCache:
        # Specifies that each entry should be automatically removed from the cache once
        # this duration has elapsed after the entry's creation, the most recent replacement of
        # its value, or its last read. In ISO-8601 duration format, e.g. 'PT10M'. If no value is set then
        #  entries will not be aged out based these criteria
        expireAfterAccess: 
        # Specifies that each entry should be automatically removed from the cache once
        # a fixed duration has elapsed after the entry's creation, or the most recent replacement of its value.
        # In ISO-8601 duration format, e.g. 'PT5M'. If no value is set then entries will not be aged out based on
        # these criteria.
        expireAfterWrite:
        # Specifies the maximum number of entries the cache may contain. Note that the cache
        # may evict an entry before this limit is exceeded or temporarily exceed the threshold while evicting.
        # As the cache size grows close to the maximum, the cache evicts entries that are less likely to be used
        # again. For example, the cache may evict an entry because it hasn't been used recently or very often.
        # When size is zero, elements will be evicted immediately after being loaded into the cache. This can
        # be useful in testing, or to disable caching temporarily without a code change. If no value is set then
        # no size limit will be applied
        maximumSize:
        # Specifies that each entry should be automatically refreshed in the cache after
        # a fixed duration has elapsed after the entry's creation, or the most recent replacement of its value.
        # In ISO-8601 duration format, e.g. 'PT5M'. Refreshing is performed asynchronously and the current value
        # provided until the refresh has occurred. This mechanism allows the cache to update values without any
        # impact on performance
        refreshAfterWrite:
        # Determines whether/how statistics are captured on cache usage
        # (e.g. hits, misses, entries, etc.). Values are (NONE, INTERNAL, DROPWIZARD_METRICS).
        # NONE means capture no stats, offering a very slight performance gain, but the Caches screen in Stroom
        # won't be able to show any stats for this cache.
        # INTERNAL means the stats are captured but are only accessible via the Stroom Caches screen, thus not
        # suitable for Stroom-Proxy.
        # DROPWIZARD_METRICS means the stats are captured and are accessible via the Stroom Caches screen AND via
        # the metrics servlet on the admin port for integration with tools like Graphite/Collectd
        # The default for Stroom is INTERNAL, the default for Stroom-Proxy is DROPWIZARD_METRICS
        statisticsMode:

Open ID Configuration

Both Stroom and Stroom-Proxy share the same configuration structure for configuring Open ID Connect authentication. This section of config is only applicable if appConfig/proxyConfig.security.authentication.identityProviderType is set to EXTERNAL_IDP.

appConfig / proxyConfig:
  security:
    authentication:
      openId:
        # A set of audience claim values, one of which must appear in the audience
        # claim in the token.
        # If empty, no validation will be performed on the audience claim
        # If audienceClaimRequired is false and there is no audience claim in the token,
        # then allowedAudiences will be ignored
        allowedAudiences: []
        # If true the token will fail validation if the audience claim is not present
        # and allowedAudiences is not empty
        audienceClaimRequired: false
        # The authentication endpoint used in OpenId authentication
        # Should only be set if not using a configuration endpoint
        authEndpoint: null
        # If custom scopes are required for client_credentials requests then this should be
        # set to replace the default of 'openid'. E.g. for Azure AD you will likely need to set
        # this to 'openid' and '<your-app-id-uri>/.default>'
        clientCredentialsScopes:
        - "openid"
        # The client ID used in OpenId authentication.
        clientId: null
        # The client secret used in OpenId authentication.
        clientSecret: null
        # If using an AWS load balancer to handle the authentication, set this to the Amazon
        # Resource Names (ARN) of the load balancer(s) fronting stroom, which will be something
        # like 'arn:aws:elasticloadbalancing:region-code:account-id:loadbalance
        # /app/load-balancer-name/load-balancer-id'.
        # This config value will be used to verify the 'signer' in the JWT header.
        # Each value is the first N characters of the ARN and as a minimum must include up to
        # the colon after the account-id, i.e.
        # 'arn:aws:elasticloadbalancing:region-code:account-id:'
        # See https://docs.aws.amazon.com/elasticloadbalancing/latest/application/listener-authenticate-users.html#user-claims-encodin
        expectedSignerPrefixes: []
        # Some OpenId providers, e.g. AWS Cognito, require a form to be used for token requests.
        formTokenRequest: true
        # A template to build the user's full name using claim values as variables in the
        # template. E.g '${firstName} ${lastName}' or '${name}'.
        # If this property is set in the YAML file, use single quotes to prevent the
        # variables being expanded when the config file is loaded. Note: claim names are
        # case sensitive
        fullNameClaimTemplate: "${name}"
        # The type of Open ID Connect identity provider that stroom/prox
        # will use for authentication. Valid values are:
        # INTERNAL_IDP - Stroom's internal IDP. Not valid for Stroom-Proxy.
        # EXTERNAL_IDP - An external IDP such as KeyCloak/Cognito,
        # TEST_CREDENTIALS - Use hard-coded authentication credentials for test/demo only and
        # NO_IDP - No IDP is used. API keys are set in config for feed status checks. Only for use by Stroom-Proxy
        # Changing this property will require a restart of the application
        identityProviderType: "NO_IDP"
        # The issuer used in OpenId authentication.
        # Should only be set if not using a configuration endpoint
        issuer: null
        # The URI to obtain the JSON Web Key Set from in OpenId authentication
        # Should only be set if not using a configuration endpoint
        jwksUri: null
        # The logout endpoint for the identity provider
        # This is not typically provided by the configuration endpoint
        logoutEndpoint: null
        # The name of the URI parameter to use when passing the logout redirect URI to the IDP.
        # This is here as the spec seems to have changed from 'redirect_uri' to
        # 'post_logout_redirect_uri'
        logoutRedirectParamName: "post_logout_redirect_uri"
        # You can set an openid-configuration URL to automatically configure much of the openid
        # settings. Without this the other endpoints etc must be set manually
        openIdConfigurationEndpoint: null
        # If the token is signed by AWS then use this pattern to form the URI to obtain the
        # public key from. The pattern supports the variables '${awsRegion}' and '${keyId}'.
        # Multiple instances of a variable are also supported.
        # If this property is set in the YAML file, use single quotes to prevent the
        # variables being expanded when the config file is loaded.
        publicKeyUriPattern: "https://public-keys.auth.elb.${awsRegion}.amazonaws.com/${keyId}"
        # If custom auth flow request scopes are required then this should be set to replace
        # the defaults of 'openid' and 'email'.
        requestScopes:
        - "openid"
        - "email"
        # The token endpoint used in OpenId authentication
        # Should only be set if not using a configuration endpoint
        tokenEndpoint: null
        # The Open ID Connect claim used to link an identity on the IDP to a stroom user.
        # Must uniquely identify the user on the IDP and not be subject to change. Uses 'sub' by
        # default
        uniqueIdentityClaim: "sub"
        # The Open ID Connect claim used to provide a more human friendly username for a user
        # than that provided by uniqueIdentityClaim. It is not guaranteed to be unique and may
        # change
        userDisplayNameClaim: "preferred_username"
        # A set of issuers (in addition to the 'issuer' property that is provided by the IDP
        # that are deemed valid when seen in a token. If no additional valid issuers are
        # required then set this to an empty set. Also this is used to validate the 'issuer'
        # returned by the IDP when it is not a sub path of 'openIdConfigurationEndpoint'. If
        # this set is empty then Stroom will verify that the
        validIssuers: []

Jersey HTTP Client Configuration

Stroom and Stroom Proxy use the Jersey client for making HTTP connections with other nodes or other systems (e.g. Open ID Connect identity providers). In the YAML file, the jerseyClients key controls the configuration of the various clients in use.

To allow complete control of the client configuration, Stroom uses the concept of named client configurations. Each named client will be unique to a destination (where a destination is typically a server or a cluster of functionally identical servers). Thus the configuration of the connections to each of those destinations can be configured independently.

The client names are as follows:

DEFAULT - The default client configuration used if a named configuration is not present.
AWS_PUBLIC_KEYS - Connections to fetch AWS public keys used in Open ID Connect authentication.
DOWNSTREAM - Connections to downstream proxy/stroom instances to check feed status. (Stroom Proxy only).
OPEN_ID - Connections to an Open ID Connect identity provider, e.g. Cognito, Azure AD, KeyCloak, etc.
STROOM - Inter-node communications within the Stroom cluster (Stroom only).

Note

If a named configuration does not exist then the configuration for DEFAULT will be used. If DEFAULT is not defined in the configuration then the Dropwizard defaults will be used.

The following is an example of how the clients are configured in the YAML file:

jerseyClients:
  DEFAULT:
    # Default client configuration, e.g.
    timeout: 500ms
  STROOM:
    # Configuration items for stroom inter-node communications
    timeout: 30s
  # etc.

The configuration keys (along with their default values and descriptions) for each client can be found here:

The following is another example including most keys:

jerseyClients:
  DEFAULT:
    minThreads: 1
    maxThreads: 128
    workQueueSize: 8
    gzipEnabled: true
    gzipEnabledForRequests: true
    chunkedEncodingEnabled: true
    timeout: 500ms
    connectionTimeout: 500ms
    timeToLive: 1h
    cookiesEnabled: false
    maxConnections: 1024
    maxConnectionsPerRoute: 1024
    keepAlive: 0ms
    retries: 0
    userAgent: <application name> (<client name>)
    proxy:
      host: 192.168.52.11
      port: 8080
      scheme : http
      auth:
        username: secret
        password: stuff
        authScheme: NTLM
        realm: realm
        hostname: host
        domain: WINDOWSDOMAIN
        credentialType: NT
      nonProxyHosts:
        - localhost
        - '192.168.52.*'
        - '*.example.com'
    tls:
      protocol: TLSv1.2
      provider: SunJSSE
      verifyHostname: true
      keyStorePath: /path/to/file
      keyStorePassword: changeit
      keyStoreType: JKS
      trustStorePath: /path/to/file
      trustStorePassword: changeit
      trustStoreType: JKS
      trustSelfSignedCertificates: false
      supportedProtocols: TLSv1.1,TLSv1.2
      supportedCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
      certAlias: alias-of-specific-cert

Note

Duration values in the Jersey client configuration blocks are different to Stroom Durations defined in Stroom properties. They are defined as a numeric value and a unit suffix. Typical suffixes are (in ascending order): ns, us, ms, s, m, h, d. ISO 8601 duration strings are NOT supported, nor are values without a suffix. Full list of duration suffixes and their aliases

Note

The paths used for the key and trust stores will be treated in the same way as Stroom property paths, i.e. relative to stroom.home if relative and supporting variable substitution.

Logging Configuration

The Dropwizard configuration file controls all the logging by the application. In addition to the main application log, there are additional logs such as stroom user events (for audit), Stroom-Proxy send and receive logs and database migration logs.

For full details of the logging configuration, see Dropwizard Logging Configuration

Request Log

The request log is slightly different to the other logs. It logs all requests to the web server. It is configured in the server section.

The property archivedLogFilenamePattern controls rolling of the active log file. The date pattern in the filename controls the frequency that the log files are rolled. In this example, files will be rolled every 1 minute.

server:
  requestLog:
    appenders:
    - type: file
      currentLogFilename: logs/access/access.log
      discardingThreshold: 0
      # Rolled and gzipped every minute
      archivedLogFilenamePattern: logs/access/access-%d{yyyy-MM-dd'T'HH:mm}.log.gz
      archivedFileCount: 10080
      logFormat: '%h %l "%u" [%t] "%r" %s %b "%i{Referer}" "%i{User-Agent}" %D'

Logback Logs

Dropwizard uses Logback for application level logging. All logs in Stroom and Stroom-Proxy apart from the request log are Logback based logs.

Logback uses the concept of Loggers and Appenders. A Logger is a named thing that produces log messages. An Appender is an output that a Logger can append its log messages to. Typical Appenders are:

File - appends messages to a file that may or may not be rolled.
Console - appends messages to stdout.
Syslog - appends messages to syslog.

Loggers

A Logger can append to more than one Appender if required. For example, the default configuration file for Stroom has two appenders for the application logs. The rolled files from one appender are POSTed to Stroom to index its own logs, then deleted and the other is intended to remain on the server until archived off to allow viewing by an administrator.

A Logger can be configured with a severity, valid severities are (TRACE, DEBUG, WARN, ERROR). The severity set on a logger means that only messages with that severity or higher will be logged, with the rest not logged.

Logger names are typically the name of the Java class that is producing the log message. You don’t need to understand too much about Java classes as you are only likely to change logger severities when requested by one of the developers. Some loggers, such as event-logger do not have a Java class name.

As an example this is a portion of a Stroom config.yml file to illustrate the different loggers/appenders:

logging:
  # This is root logging severity level for all loggers. Only messages >= to WARN will be logged unless overridden
  # for a specific logger
  level: WARN

  # All the named loggers
  loggers:
    # Logs useful information about stroom. Only set DEBUG on specific 'stroom' classes or packages
    # due to the large volume of logs that would be produced for all of 'stroom' in DEBUG.
    stroom: INFO
    # Logs useful information about dropwizard when booting stroom
    io.dropwizard: INFO
    # Logs useful information about the jetty server when booting stroom
    org.eclipse.jetty: INFO
    # Logs REST request/responses with headers/payloads. Set this to OFF to turn disable that logging.
    org.glassfish.jersey.logging.LoggingFeature: INFO
    # Logs summary information about FlyWay database migrations
    org.flywaydb: INFO
    # Logger and custom appender for audit logs
    event-logger:
      level: INFO
      # Prevents messages from this logger from being sent to other appenders
      additive: false
      appenders:
        - type: file
          currentLogFilename: logs/user/user.log
          discardingThreshold: 0
          # Rolled every minute
          archivedLogFilenamePattern: logs/user/user-%d{yyyy-MM-dd'T'HH:mm}.log
          # Minute rolled logs older than a week will be deleted. Note rolled logs are deleted
          # based on the age of the window they contain, not the number of them. This value should be greater
          # than the maximum time stroom is not producing events for.
          archivedFileCount: 10080
          logFormat: "%msg%n"
    # Logger and custom appender for the flyway DB migration SQL output
    org.flywaydb.core.internal.sqlscript:
      level: DEBUG
      additive: false
      appenders:
        - type: file
          currentLogFilename: logs/migration/migration.log
          discardingThreshold: 0
          # Rolled every day
          archivedLogFilenamePattern: logs/migration/migration-%d{yyyy-MM-dd}.log
          archivedFileCount: 10
          logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"

Appenders

The following is an example of the default appenders that will be used for all loggers unless they have their own custom appender configured.

logging:
  # Appenders for all loggers except for where a logger has a custom appender configured
  appenders:

    # stdout
  - type: console
    # Multi-coloured log format for console output
    logFormat: "%highlight(%-6level) [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%green(%t)] %cyan(%logger) - %X{code} %msg %n"
    timeZone: UTC
#
    # Minute rolled files for stroom/datafeed, will be curl'd/deleted by stroom-log-sender
  - type: file
    currentLogFilename: logs/app/app.log
    discardingThreshold: 0
    # Rolled and gzipped every minute
    archivedLogFilenamePattern: logs/app/app-%d{yyyy-MM-dd'T'HH:mm}.log.gz
    # One week using minute files
    archivedFileCount: 10080
    logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"

Log Rolling

Rolling of log files can be done based on size of file or time. The archivedLogFilenamePattern property controls the rolling behaviour. The rolling policy is determined from the filename pattern, e.g. a pattern with a minute precision date format will be rolled every minute. The following is an example of an appender that rolls based on the size of the log file:

  - type: file
    currentLogFilename: logs/app.log
    # The name pattern, where i a sequential number indicating age, where 1 is the most recent
    archivedLogFilenamePattern: logs/app-%i.log
    # The maximum number of rolled files to keep
    archivedFileCount: 10
    # The maximum size of a log file
    maxFileSize: "100MB"
    logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"

The following is an example of an appender that rolls every minute to gzipped files:

  - type: file
    currentLogFilename: logs/app/app.log
    # Rolled and gzipped every minute
    archivedLogFilenamePattern: logs/app/app-%d{yyyy-MM-dd'T'HH:mm}.log.gz
    # One week using minute files
    archivedFileCount: 10080
    logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"

Warning

Log file rolling is event based, so a file will only roll when a new message arrives that would require a roll to happen. This means that if the application is idle for a long period with no log output then the un-rolled file will remain active until a new message arrives to trigger it to roll. For example, if Stroom is unused overnight, then the last log message from the night before will not be rolled until a new messages arrive in the morning.

For this reason, archivedFileCount should be set to a value that is greater than the maximum time the application may be idle, else rolled log files may be deleted as soon as they are rolled.

2 - Stroom Configuration

Describes how the Stroom application is configured.

General configuration

The Stroom application is essentially just an executable JAR file that can be run when provided with a configuration file, config.yml. This config file is common to all forms of deployment.

config.yml

Stroom operates on a configuration by exception basis so all configuration properties will have a sensible default value and a property only needs to be explicitly configured if the default value is not appropriate, e.g. for tuning a large scale production deployment or where values are environment specific. As a result config.yml only contains a minimal set of properties. The full tree of properties can be seen in ./config/config-defaults.yml and a schema for the configuration tree (along with descriptions for each property) can be found in ./config/config-schema.yml. These two files can be used as a reference when configuring stroom.

Key Configuration Properties

The following are key properties that would typically be changed for a production deployment. All configuration branches are relative to the appConfig root.

The database name(s), hostname(s), port(s), usernames(s) and password(s) should be configured using these properties. Typically stroom is configured to keep it statistics data in a separate database to the main stroom database, as is configured below.

  commonDbDetails:
    connection:
      jdbcDriverUrl: "jdbc:mysql://localhost:3307/stroom?useUnicode=yes&characterEncoding=UTF-8"
      jdbcDriverUsername: "stroomuser"
      jdbcDriverPassword: "stroompassword1"
  statistics:
    sql:
      db:
        connection:
          jdbcDriverUrl: "jdbc:mysql://localhost:3307/stats?useUnicode=yes&characterEncoding=UTF-8"
          jdbcDriverUsername: "statsuser"
          jdbcDriverPassword: "stroompassword1"

In a clustered deployment each node must be given a node name that is unique within the cluster. This is used to identify nodes in the Nodes screen. It could be the hostname of the node or follow some other naming convention.

  node:
    name: "node1a"

Each node should have its identity on the network configured so that it uses the appropriate FQDNs. The nodeUri hostname is the FQDN of each node and used by nodes to communicate with each other, therefore it can be private to the cluster of nodes. The publicUri hostname is the public facing FQDN for stroom, i.e. the address of a load balancer or Nginx. This is the address that users will use in their browser.

  nodeUri:
    hostname: "localhost" # e.g. node5.stroomnodes.somedomain
  publicUri:
    hostname: "localhost" # e.g. stroom.somedomain

Deploying without Docker

Stroom running without docker has two files to configure it. The following locations are relative to the stroom home directory, i.e. the root of the distribution zip.

./config/config.yml - Stroom configuration YAML file
./config/scripts.env - Stroom scripts configuration env file

The distribution also includes these files which are helpful when it comes to configuring stroom.

./config/config-defaults.yml - Full version of the config.yml file containing all branches/leaves with default values set. Useful as a reference for the structure and the default values.
./config/config-schema.yml - The schema defining the structure of the config.yml file.

scripts.env

This file is used by the various shell scripts like start.sh, stop.sh, etc. This file should not need to be changed unless you want to change the locations where certain log files are written to or need to change the java memory settings.

In a production system it is highly likely that you will need to increase the java heap size as the default is only 2G. The heap size settings and any other java command line options can be set by changing:

JAVA_OPTS="-Xms512m -Xmx2048m"

As part of a docker stack

When stroom is run as part of one of our docker stacks, e.g. stroom_core there are some additional layers of configuration to take into account, but the configuration is still primarily done using the config.yml file.

Stroom’s config.yml file is found in the stack in ./volumes/stroom/config/ and this is the primary means of configuring Stroom.

The stack also ships with a default config.yml file baked into the docker image. This minimal fallback file (located in /stroom/config-fallback/ inside the container) will be used in the absence of one provided in the docker stack configuration (./volumes/stroom/config/).

The default config.yml file uses environment variable substitution so some configuration items will be set by environment variables set into the container by the stack env file and the docker-compose YAML. This approach is useful for configuration values that need to be used by multiple containers, e.g. the public FQDN of Nginx, so it can be configured in one place.

If you need to further customise the stroom configuration then it is recommended to edit the ./volumes/stroom/config/config.yml file. This can either be a simple file with hard coded values or one that uses environment variables for some of its configuration items.

The configuration works as follows:

env file (stroom<stack name>.env)
                |
                |
                | environment variable substitution
                |
                v
docker compose YAML (01_stroom.yml)
                |
                |
                | environment variable substitution
                |
                v
Stroom configuration file (config.yml)

Ansible

If you are using Ansible to deploy a stack then it is recommended that all of stroom’s configuration properties are set directly in the config.yml file using a templated version of the file and to NOT use any environment variable substitution. When using Ansible, the Ansible inventory is the single source of truth for your configuration so not using environment variable substitution for stroom simplifies the configuration and makes it clearer when looking at deployed configuration files.

Stroom-ansible has an example inventory for a single node stroom stack deployment. The group_vars/all file shows how values can be set into the env file.

Configuration Reference

appConfig:
  haltBootOnConfigValidationFailure: true
  ...

The following sections document each level one branch of appConfig, e.g. appConfig.receive.

A common structure within the configuration is the Cache Configuration. Typically any property name that ends ....Cache has this structure.

Each functional area/module in Stroom has its own logical database connection. Any property with the name db is a standard structure for configuring a database connection. See Common Database Configuration.

This allows each module to, in theory, connect to a separate database, be they on one host or multiple. In practice most Stroom deployments will use one database connection for all modules. See commonDbDetails for details on how to use one shared database configuration.

`activity`

appConfig:
  activity:
    db: # Common database configuration branch

`analytics`

appConfig:
  analytics:
    db: # Common database configuration branch
    duplicateCheckStore:
      lmdb: # Common LMDB structure
        localDir: "lmdb/duplicate_check"
    emailConfig:
      fromAddress: "noreply@stroom"
      fromName: "Stroom Analytics"
      smtp:
        host: "localhost"
        password: null
        port: 2525
        transport: "plain"
        username: null
    executionHistoryRetention: "P10D"
    resultStore:
      lmdb: # Common LMDB structure
        localDir: "lmdb/analytic_store"
      maxPayloadSize: "1G"
      maxPutsBeforeCommit: 10000
      maxSortedItems: 500000
      maxStringFieldLength: 1000
      minPayloadSize: "1M"
      offHeapResults: true
      valueQueueSize: 10000
    streamingAnalyticCache: # Common cache structure
    timezone: "UTC"

`annotation`

appConfig:
  annotation:
    annotationFeedCache:
    annotationTagCache:
    createText: "Create Annotation"
    db:
    defaultRetentionPeriod: "5y"
    physicalDeleteAge: "P7D"
    standardComments: []

`askStroomAi`

appConfig:
  askStroomAi:
    chatMemory:
      timeToLive:
        time: 1
        timeUnit: "HOURS"
      tokenLimit: 30000
    tableSummary:
      maximumBatchSize: 16384
      maximumTableInputRows: 100

`autoContentCreation`

appConfig:
  autoContentCreation:

    #An optional group to add the group defined by groupTemplate to.
    #The value of this property is the name of a group. It can be the same 
    #as groupParentGroupName if required. 
    #It allows all the templated groups to belong to a common group for easier 
    #permission management.
    additionalGroupParentGroupName: "Data Feed Developer"

    #If set, when Stroom auto-creates a feed, it will create an additional user group with a 
    #name derived from this template. This is in addition to the user group defined by 'groupTemplate'.
    #If not set, only the latter user group will be created. Default value is 'grp-${accountid}-sandbox'. 
    #If this property is set in the YAML file, use single quotes to prevent the 
    #variables being expanded when the config file is loaded.
    additionalGroupTemplate: "grp-${accountid}-sandbox"

    #The subjectId of the user/group who the auto-created content will be created by, 
    #typically a group with administrator privileges. 
    #This user/group must have the permission to create all content required. It will also be the 
    #'run as' user for created pipeline processor filters.
    createAsSubjectId: "Administrators"

    #The type of the entity represented by createAsSubjectId, i.g. 'USER' or 'GROUP'. 
    #It is possible for content to be owned by a group rather than individual users.
    createAsType: "GROUP"

    #The templated path to a folder in the Stroom explorer tree where Stroom will auto-create 
    #content. If it doesn't exist it will be created. Content will be created in a sub-folder of this 
    #folder with a name derived from the system name of the received data. By default this is 
    #'Feeds/${accountid}'.
    #If this property is set in the YAML file, use single quotes to prevent the 
    #variables being expanded when the config file is loaded.
    destinationExplorerPathTemplate: "/Feeds/${accountid}"

    #An optional templated sub-path of 'destinationExplorerPathTemplate'. If set, copied dependencies (e.g.
    #XSLT filters, Test Converters, etc.) will be created in the sub-directory defined by this template. 
    #If not set, that content will be created in the directory 
    destinationExplorerSubPathTemplate: "sandbox"

    #Whether the auto-creation of content on data receipt is enabled or not. 
    #If enabled, Stroom will automatically create content such as Feeds/XSLTs/Pipelines on receipt of 
    #a data stream. The property 'templatesPath' will contain content to be used as templates for 
    #auto-creation. Content will only be created if a Content Template rule matches the attributes 
    #on the incoming data.
    enabled: false

    #An optional group to add the group defined by groupTemplate to.
    #The value of this property is the name of a group. 
    #It allows all the templated groups to belong to a common group for easier 
    #permission management.
    groupParentGroupName: "Data Feed Reader"

    #When Stroom auto-creates a feed, it will create a user group with a 
    #name derived from this template. Default value is 'grp-${accountid}'. 
    #If this property is set in the YAML file, use single quotes to prevent the 
    #variables being expanded when the config file is loaded.
    groupTemplate: "grp-${accountid}"

    #The header keys available for use when matching a request to a content template. 
    #Must be in lower case.
    templateMatchFields:
    - "accountid"
    - "accountname"
    - "component"
    - "feed"
    - "format"
    - "schema"
    - "schemaversion"

`byteBufferPool`

appConfig:
  byteBufferPool:
    blockOnExhaustedPool: false
    pooledByteBufferCounts:
      1: 50
      10: 50
      100: 50
      1000: 50
      10000: 50
      100000: 10
      1000000: 3
    warningThresholdPercentage: 90

`cluster`

appConfig:
  cluster:
    clusterCallIgnoreSSLHostnameVerifier: true
    clusterCallReadTimeout: "PT30S"
    clusterCallUseLocal: true
    clusterResponseTimeout: "PT30S"

`clusterLock`

appConfig:
  clusterLock:
    db:
    lockTimeout: "PT10M"

`commonDbDetails`

appConfig:
  commonDbDetails:

commonDbDetails has the same structure as all the db branches. It is used for defining a database connection configuration that will be used for all stroom functional areas/modules unless the module has explicitly configured its db configuration branch.

`contentPackImport`

appConfig:
  contentPackImport:
    enabled: false
    importAsSubjectId: "Administrators"
    importAsType: "GROUP"
    importDirectory: "content_pack_import"

`contentStore`

appConfig:
  contentStore:
    urls:
    - "https://raw.githubusercontent.com/gchq/stroom-content/refs/heads/master/source/content-store.yml"

`credentials`

appConfig:
  credentials:
    db:
    keyStoreCachePath: "${stroom.home}/keystores"

`crossModule`

appConfig:
  crossModule:
    db:

`dashboard`

appConfig:
  dashboard:
    visualisationDocCache:
      expireAfterAccess: null
      expireAfterWrite: "PT10M"
      maximumSize: 100
      refreshAfterWrite: null
      statisticsMode: "INTERNAL"

`data`

appConfig:
  data:
    filesystemVolume:
      createDefaultStreamVolumesOnStart: true
      defaultStreamVolumeFilesystemUtilisation: 0.9
      defaultStreamVolumeGroupName: "Default Volume Group"
      defaultStreamVolumePaths:
      - "volumes/default_stream_volume"
      feedPathCache:
      findOrphanedMetaBatchSize: 7000
      maxVolumeStateAge: "PT30S"
      metaTypeExtensions:
        Detections: "dtxn"
        Error: "err"
        Events: "evt"
        Raw Events: "revt"
        Raw Reference: "rref"
        Records: "rec"
        Reference: "ref"
        Test Events: "tevt"
        Test Reference: "tref"
      typePathCache:
      volumeCache:
      volumeSelector: "RoundRobin"
    meta:
      dataFormats:
      - "FIXED_WIDTH_NO_HEADER"
      - "INI"
      - "CSV"
      - "JSON"
      - "TEXT"
      - "XML_FRAGMENT"
      - "YAML"
      - "PSV_NO_HEADER"
      - "PSV"
      - "CSV_NO_HEADER"
      - "XML"
      - "TSV"
      - "SYSLOG"
      - "TSV_NO_HEADER"
      - "FIXED_WIDTH"
      - "TOML"
      db:
      metaFeedCache:
      metaProcessorCache:
      metaStatusUpdateBatchSize: 0
      metaTypeCache:
      metaTypes:
      - "Context"
      - "Raw Reference"
      - "Events"
      - "Raw Events"
      - "Reference"
      - "Error"
      - "Test Events"
      - "Test Reference"
      - "Detections"
      - "Meta Data"
      - "Records"
      metaValue:
        addAsync: true
        deleteAge: "P30D"
        deleteBatchSize: 500
        flushBatchSize: 500
      rawMetaTypes:
      - "Raw Reference"
      - "Raw Events"
    retention:
      deleteBatchSize: 1000
      useQueryOptimisation: true
    store:
      db:
      deleteBatchSize: 1000
      deleteFailureThreshold: 100
      deletePurgeAge: "P7D"
      fileSystemCleanBatchSize: 20
      fileSystemCleanDeleteOut: false
      fileSystemCleanOldAge: "P1D"

`docstore`

appConfig:
  docstore:
    db:

`elastic`

appConfig:
  elastic:
    client:
      maxConnections: 30
      maxConnectionsPerRoute: 10
    indexCache:
    indexClientCache:
    indexing:
      initialRetryBackoffPeriodMs: 1000
      maxNestedElementDepth: 10
      retryCount: 5
    search:
      highlight: true
      scrollDuration: "PT1M"
      storeSize: "1000000,100,10,1"
      suggestions:
        enabled: true

`explorer`

appConfig:
  explorer:
    db:
    dependencyWarningsEnabled: false
    docRefInfoCache:
    suggestedTags:
    - "reference-loader"
    - "dynamic"
    - "extraction"

`export`

appConfig:
  export:
    enabled: false

`feed`

appConfig:
  feed:
    feedDocCache:
    feedNamePattern: "^[A-Z0-9_-]{3,}$"
    unknownClassification: "UNKNOWN CLASSIFICATION"

`gitRepo`

appConfig:
  gitRepo:
    db:
    localDir: "git_repo"

`index`

appConfig:
  index:
    db:
    indexCache:
    indexFieldCache:
    ramBufferSizeMB: 1024
    writer:
      activeShardCache:
      cache:
        coreItems: 50
        maxItems: 100
        minItems: 0
        timeToIdle: "PT0S"
        timeToLive: "PT0S"
      indexShardWriterCache:
      slowIndexWriteWarningThreshold: "PT1S"

`job`

appConfig:
  job:
    db:
    enableJobsOnBootstrap: false
    enabled: true
    executionInterval: "10s"

`kafka`

appConfig:
  kafka:
    kafkaConfigDocCache:
      expireAfterAccess: "PT10S"
      expireAfterWrite: null
      maximumSize: 1000
      refreshAfterWrite: null
      statisticsMode: "INTERNAL"
    skeletonConfigContent: ".........TRUNCATED..........."

`lifecycle`

appConfig:
  lifecycle:
    enabled: true

`lmdbLibrary`

appConfig:
  lmdbLibrary:
    providedSystemLibraryPath: null
    systemLibraryExtractDir: "lmdb_library"

`logging`

appConfig:
  logging:
    deviceCache:
    logEveryRestCallEnabled: false
    maxDataElementStringLength: 500
    maxListElements: 5
    omitRecordDetailsLoggingEnabled: true

`node`

appConfig:
  node:
    db:
    name: "tba"
    status:
      heapHistogram:
        classNameMatchRegex: "^stroom\\..*$"
        classNameReplacementRegex: "((?<=\\$Proxy)[0-9]+|(?<=\\$\\$)[0-9a-f]+|(?<=\\\
          $\\$Lambda\\$)[0-9]+\\/[0-9]+)"

`nodeUri`

appConfig:
  nodeUri:
    hostname: null
    pathPrefix: null
    port: null
    scheme: null

`path`

appConfig:
  path:
    home: null
    temp: null

`pipeline`

appConfig:
  pipeline:
    appender:
      maxActiveDestinations: 100
    documentPermissionCache:
    httpClientCache:
    parser:
      cache:
      secureProcessing: true
    pipelineDataCache:
    referenceData:
      effectiveStreamCache:
      lmdb:
        localDir: "reference_data"
        readerBlockedByWriter: true
      loadingLockStripes: 2048
      maxPurgeDeletesBeforeCommit: 200000
      maxPutsBeforeCommit: 200000
      metaIdToRefStoreCache:
        expireAfterAccess: "PT1H"
        expireAfterWrite: null
        maximumSize: 1000
        refreshAfterWrite: null
        statisticsMode: "INTERNAL"
      purgeAge: "P30D"
      stagingLmdb:
        localDir: "reference_staging_data"
        maxReaders: 5
        maxStoreSize: "10G"
        readAheadEnabled: true
        readerBlockedByWriter: false
    xmlSchema:
      cache:
        expireAfterAccess: "PT10M"
        expireAfterWrite: null
        maximumSize: 1000
        refreshAfterWrite: null
        statisticsMode: "INTERNAL"
    xslt:
      cache:
        expireAfterAccess: "PT10M"
        expireAfterWrite: null
        maximumSize: 1000
        refreshAfterWrite: null
        statisticsMode: "INTERNAL"
      maxElements: 1000000

`planb`

appConfig:
  planb:
    minTimeToKeepEnvOpen: "PT1M"
    minTimeToKeepSnapshots: "PT10M"
    nodeList: []
    path: "${stroom.home}/planb"
    snapshotRetryFetchInterval: "PT1M"
    stateDocCache:

`processor`

appConfig:
  processor:
    assignTasks: true
    createTasksBeyondProcessLimit: true
    databaseMultiInsertMaxBatchSize: 500
    db:
    deleteAge: "P1D"
    disownDeadTasksAfter: "PT10M"
    fillTaskQueue: true
    processorCache:
    processorFeedCache:
    processorFilterCache:
    processorNodeCache:
    queueSize: 1000
    skipNonProducingFiltersDuration: "PT10S"
    taskCreationThreadCount: 5
    tasksToCreate: 1000
    waitToQueueTasksDuration: "PT10S"

`properties`

appConfig:
  properties:
    db:

`publicUri`

appConfig:
  publicUri:
    hostname: null
    pathPrefix: null
    port: null
    scheme: "https"

`queryDataSource`

appConfig:
  queryDataSource:
    db:

`queryHistory`

appConfig:
  queryHistory:
    daysRetention: 365
    db:
    itemsRetention: 100

`receiptPolicy`

appConfig:
  receiptPolicy:
    obfuscatedFields:
    - "AccountId"
    - "AccountName"
    - "Component"
    - "Feed"
    - "ReceivedPath"
    - "RemoteDN"
    - "RemoteHost"
    - "System"
    - "UploadUserId"
    - "UploadUsername"
    - "X-Forwarded-For"
    obfuscationHashAlgorithm: "SHA2_512"
    receiptRulesInitialFields:
      AccountId: "Text"
      Component: "Text"
      Compression: "Text"
      content-length: "Text"
      ContextEncoding: "Text"
      ContextFormat: "Text"
      EffectiveTime: "Date"
      Encoding: "Text"
      Environment: "Text"
      Feed: "Text"
      Format: "Text"
      ReceiptId: "Text"
      ReceiptIdPath: "Text"
      ReceivedPath: "Text"
      ReceivedTime: "Date"
      ReceivedTimeHistory: "Text"
      RemoteCertExpiry: "Date"
      RemoteDN: "Text"
      RemoteHost: "Text"
      RemoteAddress: "Text"
      Schema: "Text"
      SchemaVersion: "Text"
      System: "Text"
      Type: "Text"
      UploadUsername: "Text"
      UploadUserId: "Text"
      user-agent: "Text"
      X-Forwarded-For: "Text"

`receive`

appConfig:
  receive:

The receive configuration branch is common to both Stroom and Stroom Proxy. See Receive Configuration for more details.

`s3`

appConfig:
  s3:
    s3ConfigDocCache:
    skeletonConfigContent: "{\n  \"credentialsProviderType\" : \"DEFAULT\",\n  \"\
      region\" : \"eu-west-2\",\n  \"bucketName\" : \"XXXX-eu-west-2\",\n  \"keyPattern\"\
      \ : \"${type}/${year}/${month}/${day}/${idPath}/${feed}/${idPadded}.zip\"\n\
      }\n"

`search`

appConfig:
  search:
    extraction:
      extractionDelayMs: 100
      maxStoredDataQueueSize: 1000
      maxStreamEventMapSize: 1000000
      maxThreadsPerTask: 5
    maxBooleanClauseCount: 1024
    maxStoredDataQueueSize: 1000
    resultStore:
      lmdb:
        localDir: "search_results"
        maxReaders: 10
        maxStoreSize: "10G"
        readAheadEnabled: true
      map:
        minUntrimmedSize: 100000
        trimmedSizeLimit: 500000
      maxPayloadSize: "1G"
      maxPutsBeforeCommit: 10000
      maxSortedItems: 500000
      maxStringFieldLength: 1000
      minPayloadSize: "1M"
      offHeapResults: true
      valueQueueSize: 10000
    shard:
      indexShardSearcherCache:
      maxDocIdQueueSize: 1000000
      maxThreadsPerTask: 5
      remoteSearchResultCache:

`security`

appConfig:
  security:
    authentication:
      apiKeyCache:
      authenticationStateCache:
      maxApiKeyExpiryAge: "P365D"
      openId:
        allowedAudiences: []
        audienceClaimRequired: false
        authEndpoint: null
        clientCredentialsScopes:
        - "openid"
        clientId: null
        clientSecret: null
        expectedSignerPrefixes: []
        formTokenRequest: true
        fullNameClaimTemplate: "${name}"
        identityProviderType: "INTERNAL_IDP"
        issuer: null
        jwksUri: null
        logoutEndpoint: null
        logoutRedirectParamName: "post_logout_redirect_uri"
        openIdConfigurationEndpoint: null
        publicKeyUriPattern: "https://public-keys.auth.elb.${awsRegion}.amazonaws.com/${keyId}"
        requestScopes:
        - "openid"
        - "email"
        tokenEndpoint: null
        uniqueIdentityClaim: "sub"
        userDisplayNameClaim: "preferred_username"
        validIssuers: []
      preventLogin: false
    authorisation:
      appPermissionIdCache:
      db:
      docTypeIdCache:
      userAppPermissionsCache:
      userByUuidCache:
      userCache:
      userDocumentPermissionsCache:
      userGroupsCache:
      userInfoByUuidCache:
    crypto:
      secretEncryptionKey: ""
    identity:
      allowCertificateAuthentication: false
      autoCreateAdminAccountOnBoot: false
      certificateCnCaptureGroupIndex: 1
      certificateCnPattern: ".*\\((.*)\\)"
      db:
      email:
        allowPasswordResets: false
        fromAddress: "noreply@stroom"
        fromName: "Stroom User Accounts"
        passwordResetSubject: "Password reset for Stroom"
        passwordResetText: "A password reset has been requested for this email address.\
          \ Please visit the following URL to reset your password: %s."
        passwordResetUrl: "/s/resetPassword/?user=%s&token=%s"
        smtp:
          host: "localhost"
          password: null
          port: 2525
          transport: "plain"
          username: null
      failedLoginLockThreshold: 3
      openid:
        accessCodeCache:
        refreshTokenCache:
      passwordPolicy:
        allowPasswordResets: true
        forcePasswordChangeOnFirstLogin: true
        mandatoryPasswordChangeDuration: "P90D"
        minimumPasswordLength: 8
        minimumPasswordStrength: 3
        neverUsedAccountDeactivationThreshold: "P30D"
        passwordComplexityRegex: ".*"
        passwordPolicyMessage: "To conform with our Strong Password policy, you are\
          \ required to use a sufficiently strong password. Password must be more\
          \ than 8 characters."
        unusedAccountDeactivationThreshold: "P90D"
      token:
        accessTokenExpiration: "PT1H"
        algorithm: "RS256"
        defaultApiKeyExpiration: "P365D"
        emailResetTokenExpiration: "PT10M"
        idTokenExpiration: "PT1H"
        jwsIssuer: "stroom"
        refreshTokenExpiration: "P30D"
    webContent:
      contentSecurityPolicy: "default-src 'self'; script-src 'self' 'unsafe-eval'\
        \ 'unsafe-inline'; img-src 'self' data:; style-src 'self' 'unsafe-inline';\
        \ frame-ancestors 'self';"
      contentTypeOptions: "nosniff"
      frameOptions: "sameorigin"
      strictTransportSecurity: "max-age=31536000; includeSubDomains; preload"
      xssProtection: "1; mode=block"

`session`

appConfig:
  session:
    maxInactiveInterval: "P7D"

`sessionCookie`

appConfig:
  sessionCookie:
    httpOnly: true
    sameSite: "STRICT"
    secure: true

`solr`

appConfig:
  solr:
    indexCache:
    indexClientCache:
    search:
      maxBooleanClauseCount: 1024
      maxStoredDataQueueSize: 1000

`state`

appConfig:
  state:
    scyllaDbDocCache:
    sessionCache:
    stateDocCache:

`statistics`

appConfig:
  statistics:
    hbase:
      docRefType: "StroomStatsStore"
      eventsPerMessage: 100
      kafkaConfigUuid: null
      kafkaTopics:
        count: "statisticEvents-Count"
        value: "statisticEvents-Value"
    internal:
      benchmarkCluster:
      - type: "StatisticStore"
        uuid: "946a88c6-a59a-11e6-bdc4-0242ac110002"
        name: "Benchmark-Cluster Test"
      - type: "StroomStatsStore"
        uuid: "2503f703-5ce0-4432-b9d4-e3272178f47e"
        name: "Benchmark-Cluster Test"
      cpu:
      - type: "StatisticStore"
        uuid: "af08c4a7-ee7c-44e4-8f5e-e9c6be280434"
        name: "CPU"
      - type: "StroomStatsStore"
        uuid: "1edfd582-5e60-413a-b91c-151bd544da47"
        name: "CPU"
      enabledStoreTypes:
      - "StatisticStore"
      eventsPerSecond:
      - type: "StatisticStore"
        uuid: "a9936548-2572-448b-9d5b-8543052c4d92"
        name: "EPS"
      - type: "StroomStatsStore"
        uuid: "cde67df0-0f77-45d3-b2c0-ee8bb7b3c9c6"
        name: "EPS"
      heapHistogramBytes:
      - type: "StatisticStore"
        uuid: "934a1600-b456-49bf-9aea-f1e84025febd"
        name: "Heap Histogram Bytes"
      - type: "StroomStatsStore"
        uuid: "b0110ab4-ac25-4b73-b4f6-96f2b50b456a"
        name: "Heap Histogram Bytes"
      heapHistogramInstances:
      - type: "StatisticStore"
        uuid: "e4f243b8-2c70-4d6e-9d5a-16466bf8764f"
        name: "Heap Histogram Instances"
      - type: "StroomStatsStore"
        uuid: "bdd933a4-4309-47fd-98f6-1bc2eb555f20"
        name: "Heap Histogram Instances"
      memory:
      - type: "StatisticStore"
        uuid: "77c09ccb-e251-4ca5-bca0-56a842654397"
        name: "Memory"
      - type: "StroomStatsStore"
        uuid: "d8a7da4f-ef6d-47e0-b16a-af26367a2798"
        name: "Memory"
      metaDataStreamSize:
      - type: "StatisticStore"
        uuid: "946a8814-a59a-11e6-bdc4-0242ac110002"
        name: "Meta Data-Stream Size"
      - type: "StroomStatsStore"
        uuid: "3b25d63b-5472-44d0-80e8-8eea94f40f14"
        name: "Meta Data-Stream Size"
      metaDataStreamsReceived:
      - type: "StatisticStore"
        uuid: "946a87bc-a59a-11e6-bdc4-0242ac110002"
        name: "Meta Data-Streams Received"
      - type: "StroomStatsStore"
        uuid: "5535f493-29ae-4ee6-bba6-735aa3104136"
        name: "Meta Data-Streams Received"
      pipelineStreamProcessor:
      - type: "StatisticStore"
        uuid: "946a80fc-a59a-11e6-bdc4-0242ac110002"
        name: "PipelineStreamProcessor"
      - type: "StroomStatsStore"
        uuid: "efd9bad4-0bab-460f-ae98-79e9717deeaf"
        name: "PipelineStreamProcessor"
      refDataStoreEntryCount:
      - type: "StatisticStore"
        uuid: "f1587262-9cbc-46b4-80eb-51deb011b2c1"
        name: "Reference Data Store Entry Count"
      - type: "StroomStatsStore"
        uuid: "TODO"
        name: "Reference Data Store Entry Count"
      refDataStoreSize:
      - type: "StatisticStore"
        uuid: "e57959bf-0b2d-4008-98a7-ffcae4bbc4bb"
        name: "Reference Data Store Size"
      - type: "StroomStatsStore"
        uuid: "TODO"
        name: "Reference Data Store Size"
      refDataStoreStreamCount:
      - type: "StatisticStore"
        uuid: "0dfd4e00-e068-4667-9c60-d3f6163a6c04"
        name: "Reference Data Store Stream Count"
      - type: "StroomStatsStore"
        uuid: "TODO"
        name: "Reference Data Store Stream Count"
      searchResultsStoreCount:
      - type: "StatisticStore"
        uuid: "35d60e7d-f11a-45c9-981d-16d8ddda081e"
        name: "Search Results Store Count"
      - type: "StroomStatsStore"
        uuid: "TODO"
        name: "Search Results Store Count"
      searchResultsStoreSize:
      - type: "StatisticStore"
        uuid: "de5b831d-3b7e-4bb5-836f-2f438ec30568"
        name: "Search Results Store Size"
      - type: "StroomStatsStore"
        uuid: "TODO"
        name: "Search Results Store Size"
      streamTaskQueueSize:
      - type: "StatisticStore"
        uuid: "946a7f0f-a59a-11e6-bdc4-0242ac110002"
        name: "Stream Task Queue Size"
      - type: "StroomStatsStore"
        uuid: "4ce8d6e7-94be-40e1-8294-bf29dd089962"
        name: "Stream Task Queue Size"
      volumes:
      - type: "StatisticStore"
        uuid: "ac4d8d10-6f75-4946-9708-18b8cb42a5a3"
        name: "Volumes"
      - type: "StroomStatsStore"
        uuid: "60f4f5f0-4cc3-42d6-8fe7-21a7cec30f8e"
        name: "Volumes"
    sql:
      dataSourceCache:
      db:
      docRefType: "StatisticStore"
      inMemAggregatorPoolSize: 10
      inMemFinalAggregatorSizeThreshold: 1000000
      inMemPooledAggregatorAgeThreshold: "PT5M"
      inMemPooledAggregatorSizeThreshold: 1000000
      maxProcessingAge: null
      search:
        fetchSize: 5000
        maxResults: 100000
      slowQueryWarningThreshold: "PT1S"
      statisticAggregationBatchSize: 1000000
      statisticAggregationStageTwoBatchSize: 200000
      statisticFlushBatchSize: 8000

`ui`

appConfig:
  ui:
    aboutHtml: "<h1>About Stroom</h1><p>Stroom is designed to receive data from multiple\
      \ systems.</p>"
    activity:
      chooseOnStartup: false
      editorBody: "Activity Code:</br><input type=\"text\" name=\"code\"></input></br></br>Activity\
        \ Description:</br><textarea rows=\"4\" style=\"width:100%;height:80px\" name=\"\
        description\" validation=\".{80,}\" validationMessage=\"The activity description\
        \ must be at least 80 characters long.\" ></textarea>Explain what the activity\
        \ is"
      editorTitle: "Edit Activity"
      enabled: false
      managerTitle: "Choose Activity"
    analyticUiDefaultConfig:
      defaultBodyTemplate: "<!DOCTYPE html>\n<html lang=\"en\">\n<meta charset=\"\
        UTF-8\" />\n<title>Detector '{{ detectorName | escape }}' Alert</title>\n\
        <body>\n  <p>Detector <em>{{ detectorName | escape }}</em> {{ detectorVersion\
        \ | escape }} fired at {{ detectTime | escape }}</p>\n\n  {%- if (values |\
        \ length) > 0 -%}\n  <p>Detail: {{ headline | escape }}</p>\n  <ul>\n    {%\
        \ for key, val in values | dictsort -%}\n      <li><strong>{{ key | escape\
        \ }}</strong>: {{ val | escape }}</li>\n    {% endfor %}\n  </ul>\n  {% endif\
        \ -%}\n\n  {%- if (linkedEvents | length) > 0 -%}\n  <p>Linked Events:</p>\n\
        \  <ul>\n    {% for linkedEvent in linkedEvents -%}\n      <li>Environment:\
        \ {{ linkedEvent.stroom | escape }}, Stream ID: {{ linkedEvent.streamId |\
        \ escape }}, Event ID: {{ linkedEvent.eventId | escape }}</li>\n    {% endfor\
        \ %}\n  </ul>\n  {% endif %}\n</body>\n"
      defaultSubjectTemplate: "Detector '{{ detectorName | escape }}' Alert"
    defaultApiKeyHashAlgorithm: "SHA3_256"
    defaultMaxResults: "1000000,100,10,1"
    helpSubPathDocumentation: "/user-guide/content/documentation/"
    helpSubPathExpressions: "/user-guide/dashboards/expressions/"
    helpSubPathJobs: "/reference-section/jobs/"
    helpSubPathProperties: "/user-guide/properties/"
    helpSubPathQuickFilter: "/user-guide/finding-things/"
    helpSubPathStroomQueryLanguage: "/user-guide/dashboards/stroom-query-language/"
    helpUrl: "https://gchq.github.io/stroom-docs/7.5/docs"
    htmlTitle: "Stroom"
    maxEditorCompletionEntries: 1000
    namePattern: "^[a-zA-Z0-9_\\- \\.\\(\\)]{1,}$"
    nestedIndexFieldsDelimiterPattern: "[.:]"
    nodeMonitoring:
      pingMaxThreshold: 500
      pingWarnThreshold: 100
    oncontextmenu: "return false;"
    process:
      defaultRecordLimit: 1000000
      defaultTimeLimit: 30
    query:
      dashboardPipelineSelectorIncludedTags:
      - "extraction"
      indexPipelineSelectorIncludedTags:
      - "extraction"
      infoPopup:
        enabled: false
        title: "Please Provide Query Info"
        validationRegex: "^[\\s\\S]{3,}$"
      viewPipelineSelectorIncludedTags:
      - "extraction"
    referencePipelineSelectorIncludedTags:
    - "reference-loader"
    reportUiDefaultConfig:
      defaultBodyTemplate: "<!DOCTYPE html>\n<html lang=\"en\">\n<meta charset=\"\
        UTF-8\" />\n<title>Report '{{ reportName | escape }}'</title>\n<body>\n <p><em>Report:\
        \ {{ reportName | escape }}</em>  executed for {{ effectiveExecutionTime |\
        \ escape }} on {{ executionTime | escape }}</p>\n <p><em>Description:</em>\
        \  {{ description | escape }}</p>\n</body>\n"
      defaultSubjectTemplate: "Report '{{ reportName | escape }}'"
    source:
      maxCharactersInPreviewFetch: 30000
      maxCharactersPerFetch: 80000
      maxCharactersToCompleteLine: 10000
      maxHexDumpLines: 1000
    splash:
      body: "<h1>About Stroom</h1><p>Stroom is designed to receive data from multiple\
        \ systems.</p>"
      enabled: false
      title: "Splash Screen"
      version: "v0.1"
    theme:
      labelColours: "TEST1=#FF0000,TEST2=#FF9900"
    welcomeHtml: "<h1>About Stroom</h1><p>Stroom is designed to receive data from\
      \ multiple systems.</p>"

`uiUri`

appConfig:
  uiUri:
    hostname: null
    pathPrefix: null
    port: null
    scheme: "https"

`volumes`

appConfig:
  volumes:
    createDefaultIndexVolumesOnStart: true
    defaultIndexVolumeFilesystemUtilisation: 0.9
    defaultIndexVolumeGroupName: "Default Volume Group"
    defaultIndexVolumeGroupPaths:
    - "volumes/default_index_volume"
    volumeSelector: "RoundRobin"
    volumeSelectorCache:

Common Configuration Structures

The following are configuration branch structures that are used in multiple places in Stroom’s configuration.

Common Database Configuration

The following shows the structure of the common database configuration that features in many of the above configuration branches. Any property with the name db will follow this structure.

    db:
      connection:
        jdbcDriverClassName: null
        jdbcDriverPassword: null
        jdbcDriverUrl: null
        jdbcDriverUsername: null
      connectionPool:
        cachePrepStmts: false
        connectionTimeout: "PT30S"
        idleTimeout: "PT10M"
        leakDetectionThreshold: "PT0S"
        maxLifetime: "PT30M"
        maxPoolSize: 30
        minimumIdle: 10
        prepStmtCacheSize: 25
        prepStmtCacheSqlLimit: 256

Common LMDB Configuration

lmdb:
  # The directory where the LMDB files will be persisted
  localDir: "lmdb/xxxxxx"
  # The maximum number of concurrent readers
  maxReaders: 10
  # The maximum size the store can grow to
  maxStoreSize: "10G"
  # If true LMDB with read additional pages of data to optimistically hold
  # in the page cache.
  readAheadEnabled: true
  # If true readers will be blocked when other threads are writing.
  # This can prevent excessive store size growth if reading and writing happens concurrently.
  readerBlockedByWriter: true

3 - Stroom Proxy Configuration

Describes how the Stroom-Proxy application is configured.

YAML Configuration File

The Stroom-proxy application is essentially just an executable JAR file that can be run when provided with a configuration file, config.yml. This configuration file is common to all forms of deployment.

As Stroom-proxy does not have a user interface, the config.yml file is the only way of configuring Stroom-Proxy. As with stroom, the config.yml file is split into three sections using these keys:

server - Configuration of the web server, e.g. ports, paths, request logging. See Server Configuration
logging - Configuration of application logging. See Logging Configuration
proxyConfig - Stroom-Proxy specific configuration

See also Properties for more details on structure of the config.yml file and supported data types.

Stroom-Proxy operates on a configuration by exception basis so as far as is possible, all configuration properties will have a sensible default value and a property only needs to be explicitly configured if the default value is not appropriate (e.g. for tuning a large scale production deployment) or where values are environment specific (e.g. the hostname of a forward destination).

As a result the config.yml shipped with Stroom Proxy only contains a minimal set of properties. The full tree of properties can be seen in ./config/config-defaults.yml and a schema for the configuration tree (along with descriptions for each property) can be found in ./config/config-schema.yml. These two files can be used as a reference when configuring stroom.

In the snippets of YAML configuration below, the default sections

Basic Structure

Stroom-Proxy has a number of key functions which are all configured via its YAML configuration file.

The following YAML shows the high level structure of the Stroom-Proxy configuration file. Each branch of the this YAML is explained in more detail below.

proxyConfig:

  # This should be set to a value that is unique within your Stroom/Stroom-Proxy estate.
  # It is used in the unique ReceiptId that is set in the meta of received data so
  # provides provenence of where data was received at each stage.
  proxyId: null

  # If true, Stroom-Proxy will halt on start up if any errors are found in the YAML
  # configuration file. If false, the errors will simply be logged. Setting this to
  # false is not advised
  haltBootOnConfigValidationFailure: true

  # Configuration of the base and temp paths used by Stroom-Proxy.
  # See Path Configuration below
  path:

  # This is the downstream (in flow of stream data terms) Stroom/Stroom-Proxy instance/cluster
  # used for feed status checks, supplying data receipt rules and verifying API keys.
  downstreamHost:

  # This controls the aggregation of received data into larger chunks prior to forwarding.
  # This is typically required to prevent Stroom receiving lots of small streams.
  aggregator:

  # If receive.receiptCheckMode is FEED_STATUS, this controls the feed status
  # checking. See Feed Status Configuration below.
  feedStatus:

  # Zero to many HTTP POST based destinations.
  # E.g. for forwarding to Stroom or another Stroom-Proxy
  forwardHttpDestinations:

  # Zero to many file system based destinations. See Forward Configuration below.
  forwardFileDestinations:

  # This controls the meta entries that will be included in the send and receive logs.
  logStream:

  # If receive.receiptCheckMode is RECEIPT_POLICY, this controls the fetching
  # of the policy rules.
  receiptPolicy:

  # This section is common to both Stroom and Stroom-Proxy
  # See Receive Configuration below.
  receive:

  # Configuration for authentication. See Security Configuration below.
  security:

Stroom-proxy should be configured to check the receipt status of feeds on receipt of data. This is done by configuring the end point of a downstream stroom-proxy or stroom.

  feedStatus:
    url: "http://stroom:8080/api/feedStatus/v1"
    apiKey: ""

The url should be the url for the feed status API on the downstream stroom(-proxy). If this is on the same host then you can use the http endpoint, however if it is on a remote host then you should use https and the host of its nginx, e.g. https://downstream-instance/api/feedStatus/v1.

In order to use the API, the proxy must have a configured apiKey. The API key must be created in the downstream stroom instance and then copied into this configuration.

If the proxy is configured to forward data then the forward destination(s) should be set. This is the datafeed endpoint of the downstream stroom-proxy or stroom instance that data will be forwarded to. This may also be the address of a load balancer or similar that is fronting a cluster of stroom-proxy or stroom instances. See also Feed status certificate configuration.

  forwardHttpDestinations:
    - enabled: true
      name: "downstream"
      forwardUrl: "https://some-host/stroom/datafeed"

forwardUrl specifies the URL of the datafeed endpoint on the destination host. Each forward location can use a different key/trust store pair. See also Forwarding certificate configuration.

If the proxy is configured to store then the location of the proxy repository may need to be configured if it needs to be in a different location to the proxy home directory, e.g. on another mount point.

Aggregator Configuration

proxyConfig:
  aggregator:
    enabled: true
    # Whether to split received ZIPs if they are too large.
    splitSources: true
    # Maximum number of items to include in an aggregate
    maxItemsPerAggregate: 1000
    # Maximum size of the aggregate in uncompressed bytes.
    # Aggregates may be larger than this is splitSources is false or single very
    # large streams are received.
    maxUncompressedByteSize: "1G"
    #The the length of time that data is added to an aggregate for before the aggregate is closed.
    aggregationFrequency: "PT10M"

Note

The aggregator settings apply to all forwarders. It is not possible for forwarders to to use different aggregation settings.

If you need to forward to a HTTP destination but also want to forward to a file destination using different aggregator settings, e.g. to keep a local archive of the data, you would need to employ a second Stroom-Proxy. Stroom-Proxy A would forward to the HTTP downstream and forward to Stroom-Proxy B over HTTP. Stroom-Proxy B would forward to a file destination, using much larger aggregator thresholds.

Directory Scanner Configuration

This configuration controls the directories that Stroom-Proxy scans to look for ZIP files to ingest. It is primarily used as a means of manually re-processing files that have failed to forward, either as a result of too many retries or due to an unrecoverable error.

proxyConfig:
  dirScanner:
    # One or more directories to scan.
    # If the path is relative it is treated as relative to the proxyConfig.path.home property.
    dirs:
    - "zip_file_ingest"
    # Whether directory scanning is enabled or not
    enabled: true
    # The directory to move any failed files to.
    # If the path is relative it is treated as relative to the proxyConfig.path.home property.
    failureDir: "zip_file_ingest_failed"
    # How frequently each directory is scanned for files.
    scanFrequency: "PT1M"

Downstream Host Configuration

This is the default downstream (in flow of stream data terms) Stroom/Stroom-Proxy instance/cluster used for feed status checks, supplying data receipt rules and verifying API keys.

By default it will be used as the default

proxyConfig:
  downstreamHost:
    # http or https
    scheme: "https"
    # If not set, will default to 80/443 depending on scheme
    port: 443
    hostname: "...STROOM-PROXY OR STROOM FQDN..."
    # If not using OpenID authentication you will need to provide an API key.
    apiKey: "sak_6a011e3e5d_oKimmDxfNwj......<truncated>.....HYQxHaR2"

Event Store Configuration

The Event Store is used to store and aggregate individual events received via the /api/event API API Application Programming Interface. An interface that one system can present so other systems can use it to communicate. Stroom has a number of APIs, e.g. its many REST APIs and its /datafeed interface for data receipt.Click to see more details... or the SQS Connectors. Events are appended to files specific to the Feed and Stream Type of the event. Once a threshold is reached, the file will be rolled and processed by Stroom-Proxy.

Each event is stored as a JSON line in the file.

proxyConfig:
  eventStore:
    # The size of an internal queue used to buffer aggregates that are ready to process.
    forwardQueueSize: 1000
    # The maximum age of the file before it is rolled.
    maxAge: "PT1M"
    # The maximum size of the file before it is rolled.
    maxByteCount: 9223372036854775807
    # The maximum number of events in the file before it is rolled.
    maxEventCount: 9223372036854775807
    # Configuration of the cache used for the event store.
    openFilesCache:
    # The frequency at which files are checked to see if they need to be rolled or not.
    rollFrequency: "PT10S"

Feed Status Configuration

The configuration for performing feed status checks. This section is only relevant if proxyConfig.receive.receiptCheckMode is set to FEED_STATUS.

proxyConfig:
  feedStatus:
    # Standard cache configuration block for configuring the cache of feed status check outcomes
    feedStatusCache:
    # The full URL to use for feed status checking.
    # ONLY set this if using a non-standard URL, otherwise
    # it will be derived from the downstreamHost.
    url: null

The configuration of the client certificates for feed status checks is done using the DOWNSTREAM jersey client configuration. See Stroom and Stroom-Proxy Common Configuration.

Forward Configuration

Stroom-Proxy has two configuration branches for controlling forwarding as each has a different structure.

proxyConfig:
  # Zero to many HTTP POST based destinations.
  forwardHttpDestinations:
  # Zero to many file system based destinations.
  forwardFileDestinations:

Both types of forwarder have an enabled property. If a forwarder’s enabled state is set to false it is as if the forwarder configuration does not exist, i.e no data will be queued for that forwarder until its state is changed to true.

File Forward Destinations Configuration

proxyConfig:
  # Zero to many file system based destinations.
  forwardFileDestinations:
    # Stroom-Proxy will attempt to move files onto the forward destination using an atomic move.
    # This ensures that the move does not happen more than once. If an atomic move is not possible,
    # e.g. the destination is a remote file system that does not support an atomic move, then it will
    # fall back to a non-atomic move with the risk of it happening more than once. If you see warnings
    # in the logs or know the file system will not support atomic moves then set this to false
  - atomicMoveEnabled: true
    # Whether this destination is enabled or not.
    enabled: true
    # If Instant Forwarding is to be used.
    instant: false
    # The type of liveness check to perform:
    # READ - will attempt to read the file/dir specified in livenessCheckPath. 
    # WRITE - will attempt to touch the file specified in livenessCheckPath.
    livenessCheckMode: "READ"
    # The path to use for regular liveness checking of this forward destination.
    # If null, empty or if the 'queue' property is not configured, then no liveness check
    # will be performed and the destination will be
    # assumed to be healthy. If livenessCheckMode is READ, livenessCheckPath can be a
    # directory or a file and stroom-proxy will attempt to check it can read the
    # file/directory. If livenessCheckMode is WRITE, then livenessCheckPath must be a
    # file and stroom-proxy will attempt to touch that file. It is
    # only recommended to set this property for a remote file system where
    # connection issues may be likely. If it is a relative path, it will be assumed
    # to be relative to 'path'
    livenessCheckPath: null
    # The unique name of the destination (across all file/http forward destinations.
    # The name is used in the directories on the file system, so do not change the name
    # once proxy has processed data. Must be provided.
    name: "...PROVIDE FORWARDER NAME..."
    # The base path of a directory to forward to.
    path: "...PROVIDE PATH..."
    # See Queue Configuration section below
    queue:
    # The templated relative sub-path of path.
    # The default path template is '${year}${month}${day}/${feed}'
    # Cannot be an absolute path and must resolve to a descendant of path.
    # Fore details of this configuration branch, see Path Templating Configuration below.
    subPathTemplate: null

HTTP Forward Destinations Configuration

proxyConfig:
  # Zero to many HTTP POST based destinations.
  forwardHttpDestinations:
    # If true, add Open ID authentication headers to the request. Only works if the identityProviderType
    # is EXTERNAL_IDP and the destination is in the same Open ID Connect realm as the OIDC client that this
    # proxy instance is using.
  - addOpenIdAccessToken: false
    # The API key to use when forwarding data if Stroom is configured to require an API key.
    # Does NOT use the API Key from downstreamHost config.
    apiKey: null
    # Whether this destination is enabled or not.
    enabled: true
    forwardHeadersAdditionalAllowSet: []
    # The full URL to forward to if different from <downstreamHost>/datafeed
    forwardUrl: null
    # Configuration of the HTTP client, see below.
    httpClient:
    # If Instant Forwarding is to be used.
    instant: false
    # Whether liveness checking of the HTTP destination will take place. The queue property
    # must also be configured for liveness checking to happen
    livenessCheckEnabled: true
    # The URL/path to check for liveness of the forward destination. The URL should return a 200 response
    # to a GET request for the destination to be considered live.
    # If the response from the liveness check is not a 200, forwarding
    # will be paused at least until the next liveness check is performed.
    # If this property is not set, the downstreamHost configuration will be combined with the default API
    # path (/status).
    # If this property is just a path, it will be combined with the downstreamHost configuration.
    # Only set this property if you wish to use a non-default path.
    # or you want to use a different host/port/scheme to that defined in downstreamHost
    livenessCheckUrl: null
    # The unique name of the destination (across all file/http forward destinations.
    # The name is used in the directories on the file system, so do not change the name
    # once proxy has processed data. Must be provided.
    name: "...PROVIDE FORWARDER NAME..."
    # See Queue Configuration section below
    queue:

Queue Configuration

Each forward destination (whether file or HTTP) has a queue configuration property that controls various aspects of forwarding, e.g. failure handling, delays, concurrency, etc.

  forwardHttpDestinations / forwardFileDestinations:
    queue:
      # The sub-path template to use for data that could not be retried
      # or has reached a retry limit.
      errorSubPathTemplate:
        enabled: true
        pathTemplate: "${year}${month}${day}/${feed}"
        templatingMode: "REPLACE_UNKNOWN_PARAMS"
      # A delay to add before forwarding. Primarily for testing.
      forwardDelay: "PT0S"
      # Number of threads to process retries
      forwardRetryThreadCount: 1
      # Number of threads to handle forwarding
      forwardThreadCount: 5
      # Duration between liveness checks
      livenessCheckInterval: "PT1M"
      # The maximum time from the first failed forward attempt to continue retrying.
      # After this the data will be move to the failure directory permenantly.
      maxRetryAge: "P7D"
      # The maximum time between retries. Must be greater than or equal to retryDelay.
      maxRetryDelay: "P1D"
      # If false forwards will be attempted imediately and any failure will restult in the
      # data being moved to the failure directory.
      queueAndRetryEnabled: false
      # The time between retries. If retryDelayGrowthFactor is >1, this value will grow
      # after each retry.
      retryDelay: "PT10M"
      # The factor to apply to retryDelay after each failed retry.
      retryDelayGrowthFactor: 1.0

Path Templating Configuration

The following properties all share the same structure:

proxyConfig.forwardFileDestinations.[n].subPathTemplate
proxyConfig.forwardFileDestinations.[n].queue.errorSubPathTemplate
proxyConfig.forwardHttpDestinations.[n].queue.errorSubPathTemplate

  xxxxxxTemplate:
    # Whether templating is enabled or not. If not enabled
    # no sub-path will be used.
    enabled: true
    # The template to use for the sub-path
    pathTemplate: "${year}${month}${day}/${feed}"
    # Controls how unknown parameters are dealt with. One of:
    # IGNORE_UNKNOWN_PARAMS - e.g. 'cat/${unknownparam}/dog' => 'cat/${unknownparam}/dog'
    # REMOVE_UNKNOWN_PARAMS - e.g. 'cat/${unknownparam}/dog' => 'cat/dog'
    # REPLACE_UNKNOWN_PARAMS - Replace unknown with 'XXX', e.g. 'cat/${unknownparam}/dog' => 'cat/XXX/dog'
    templatingMode: "REPLACE_UNKNOWN_PARAMS"

The following template parameters are supported:

${feed} - The Feed name.
${type} - The Stream Type.
${year} - The 4 digit year of the current date/time.
${month} - The 2 digit month of the current date/time.
${day} - The 2 digit day of the current date/time.
${hour} - The 2 digit hour of the current date/time.
${minute} - The 2 digit minute of the current date/time.
${second} - The 2 digit second of the current date/time.
${millis} - The 3 digit milliseconds of the current date/time.
${ms} - The current date/time as milliseconds since the Unix Epoch.

Liveness Checking

Each of the configured forward destinations has a liveness check that can be configured. This allows Stroom Proxy to periodically check that the destination is live. If the liveness check fails for a destination, all forwarding for that destination will be paused until a subsequent liveness check reports it as live again.

The liveness checks take the following forms:

HTTP Destination: Performs a GET request to the URL configured using forwardHttpDestinations.[n].livenessCheckUrl. If not configured it will use /status on the downstream host. The destination is considered live if it gets a 200 response. You can use a URL that allows the destination to control its liveness, i.e. to take itself off line during an upgrade.
File Destination: Reads or writes (touch) to a file defined by forwardFileDestinations.[n].livenessCheckPath. Liveness checking for a file destination may be useful if the destination is on a network file share. livenessCheckMode controls whether a read or write to the file is performed.

HTTP Client Configuration

proxyConfig:
  forwardHttpDestinations:
    httpClient:
      connectionRequestTimeout: "PT3M"
      connectionTimeout: "PT3M"
      cookiesEnabled: false
      keepAlive: "PT0S"
      maxConnections: 1024
      maxConnectionsPerRoute: 1024
      proxy: null
      retries: 0
      timeToLive: "PT1H"
      timeout: "PT3M"
      # Transport Layer Security, see below.
      tls: null
      userAgent: null
      validateAfterInactivityPeriod: "PT0S"

The tls branch of the configuration is for configuring Transport Layer Security (the successor to Secure Sockets Layer (SSL)). It is null by default, i.e. no additional TLS configuration is used. Its structure is:

proxyConfig:
  forwardHttpDestinations:
    httpClient:
      tls:
        protocol: "TLSv1.2"
        # The name of the JCE provider to use on client side for cryptographic support 
        # (for example, SunJCE, Conscrypt, BC, etc). See Oracle documentation for more information.
        provider:
        # The path of the key store file
        keyStorePath: null
        # The password of the key store file
        keyStorePassword: null
        # The type of key store (usually JKS, PKCS12, JCEKS, Windows-MY, or Windows-ROOT).
        keyStoreType: "JKS"
        keyStoreProvider: null
        # The path of the trust store file
        trustStorePath: null
        # The password of the trust store file
        trustStorePassword: null
        # The type of trust store (usually JKS, PKCS12, JCEKS, Windows-MY, or Windows-ROOT).
        trustStoreType: "JKS"
        trustStoreProvider: null
        trustSelfSignedCertificates: false
        verifyHostname: false
        # Zero to protocols (e.g., SSLv3, TLSv1) which are supported.
        # All other protocols will be refused.
        supportedProtocols: null
        # A list of cipher suites (e.g., TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256) which are supported.
        # All other cipher suites will be refused.
        supportedCiphers: null
        certAlias: null

Log Stream Configuration

This controls the meta entries that will be included in the send and receive logs.

proxyConfig:
  logStream:
    # The headers attributes that will be output in the send/receive log lines.
    # They will be output in the order that they appear in this list.
    # Duplicates will be ignored, case does not matter.
    metaKeys:
      - "guid"
      - "receiptid"
      - "feed"
      - "system"
      - "environment"
      - "remotehost"
      - "remoteaddress"
      - "remotedn"
      - "remotecertexpiry"

Path Configuration

proxyConfig:
  path:
    # By default all files read or written to by stroom-proxy will be in directories relative to
    # the home location. Ideally this should differ from the location of the Stroom Proxy
    # installed software as it has a different lifecycle.
    # If not set the location of the Stroom-Proxy application JAR file will be used and if that
    # can't be determined, <user's home>/.stroom will be used.
    home: "...SET TO AN ABSOLUTE PATH..."
    # The location for Stroom-Proxy's persisted data
    data: "data"
    # The location for any temporary files/directories.
    # If not set, will use a sub-directory called 'stroom-proxy' in the system temp dir,
    # i.e. as defined by 'java.io.tmpdir'.
    temp: null

All paths in the configuration file can be either relative or absolute. If relative then they will be treated as being relative to the home path.

Receipt Policy Configuration

This section of configuration is only applicable if proxyConfig.receive.receiptCheckMode is RECEIPT_POLICY. It controls the fetching of the receipt policy rules from a downstream Stroom or Stroom-Proxy.

proxyConfig:
  receiptPolicy:
    # Only set if using a non-standard URL, else this is derived based on downstreamHost
    # config.
    receiveDataRulesUrl: null
    # The duration between calls to fetch the latest policy rules.
    syncFrequency: "PT1M"

The configuration of the client certificates for receipt policy checks is done using the DOWNSTREAM jersey client configuration. See Stroom and Stroom-Proxy Common Configuration.

Receive Configuration

The receive configuration is common to both Stroom and Stroom-Proxy, see Receive Configuration

Security Configuration

proxyConfig:
  security:
    authentication:
      # This property is currently not used
      authenticationRequired: true
      # Open ID Connect configuration
      openId:

The openId branch of the config is common to both Stroom and Stroom-Proxy, see Open ID Configuration for details.

Amazon Simple Queue Service Configuration

Stroom-Proxy is able to consume messages from multiple AWS SQS queues. Each message received from a queue will be added to the Event Store for aggregation by Feed and Stream Type.

proxyConfig:
  # Zero to many connectors
  sqsConnectors:
    # This property is not currently used
  - awsProfileName: null
    # The name of the AWS region the SQS queue exists in.
    awsRegionName: "...AWS REGION..."
    # The maximum time to wait when polling the queue for messages
    pollFrequency: "PT10S"
    # This property is not currently used
    queueName: null
    # The URL of the Amazon SQS queue from which messages are received.
    queueUrl: "...SQS QUEUE URL..."

Thread Configuration

Stroom-Proxy is able to run certain operations in parallel. This configuration allows you to increase the number of threads used for each operation.

proxyConfig:
  threads:
    # Number of threads to consume from the aggregate input queue.
    aggregateInputQueueThreadCount: 1
    # Number of threads to consume from the forwarding input queue. 
    forwardingInputQueueThreadCount: 1
    # Number of threads to consume from the pre-aggregate input queue.
    preAggregateInputQueueThreadCount: 1
    # Number of threads to consume from the zip splitting input queue.
    zipSplittingInputQueueThreadCount: 1

Deploying without Docker

Apart from the structure of the config.yml file, the configuration in a non-docker environment is the same as for stroom.

As part of a docker stack

The way Stroom-Proxy is configured is essentially the same as for stroom with the only real difference being the structure of the config.yml file as note above . As with stroom the docker stack comes with a ./volumes/stroom-proxy-*/config/config.yml file that will be used in the absence of a provided one. Also as with stroom, the config.yml file supports environment variable substitution so can make use of environment variables set in the stack .env file and passed down via the docker-compose YAML files.

Certificates

Stroom-proxy makes use of client certificates for two purposes:

Communicating with a downstream stroom/stroom-proxy in order to establish the receipt status for the feeds it has received data for.
When forwarding data to a downstream stroom/stroom-proxy

The stack comes with the following files that can be used for demo/test purposes.

volumes/stroom-proxy-*/certs/ca.jks
volumes/stroom-proxy-*/certs/client.jks

For a production deployment these will need to be replaced with the certificates that are appropriate for your environment.

Typical Configuration

The following are a guide to typical configurations for operating a Stroom-Proxy with different use cases.

Store and Forward

This is a typical case where you want to aggregate received data then forward it to a downstream Stroom or Stroom-Proxy, but also retain a store of the aggregates.

server:
  applicationContextPath: /
  adminContextPath: /proxyAdmin
  applicationConnectors:
    - type: http
      port: "8090"
      useForwardedHeaders: true
  adminConnectors:
    - type: http
      port: "8091"
      useForwardedHeaders: true
  detailedJsonProcessingExceptionMapper: true
  requestLog:
    appenders:
      # Log appender for the web server request logging
    - type: file
      currentLogFilename: logs/access/access.log
      discardingThreshold: 0
      # Rolled and gzipped every minute
      archivedLogFilenamePattern: logs/access/access-%d{yyyy-MM-dd'T'HH:mm}.log.gz
      # One week using minute files
      archivedFileCount: 10080
      logFormat: '%h %l "%u" [%t] "%r" %s %b "%i{Referer}" "%i{User-Agent}" %D'

logging:
  level: WARN
  loggers:
    # Logs useful information about stroom proxy. Only set DEBUG on specific 'stroom' classes or packages
    # due to the large volume of logs that would be produced for all of 'stroom' in DEBUG.
    stroom: INFO
    # Logs useful information about dropwizard when booting stroom
    io.dropwizard: INFO
    # Logs useful information about the jetty server when booting stroom
    # Set this to INFO if you want to log all REST request/responses with headers/payloads.
    org.glassfish.jersey.logging.LoggingFeature: OFF

    # Logger and appender for proxy receipt audit logs
    "receive":
      level: INFO
      additive: false
      appenders:
      - type: file
        currentLogFilename: logs/receive/receive.log
        discardingThreshold: 0
        # Rolled and gzipped every minute
        archivedLogFilenamePattern: logs/receive/receive-%d{yyyy-MM-dd'T'HH:mm}.log.gz
        # One week using minute files
        archivedFileCount: 10080
        logFormat: "%-6level [%d{yyyy-MM-dd'T'HH:mm:ss.SSS'Z'}] [%t] %logger - %X{code} %msg %n"

    # Logger and appender for proxy send audit logs
    "send":
      level: INFO
      additive: false
      appenders:
      - type: file
        currentLogFilename: logs/send/send.log
        discardingThreshold: 0
        # Rolled and gzipped every minute
        archivedLogFilenamePattern: logs/send/send-%d{yyyy-MM-dd'T'HH:mm}.log.gz
        # One week using minute files
        archivedFileCount: 10080
        logFormat: "%-6level [%d{yyyy-MM-dd'T'HH:mm:ss.SSS'Z'}] [%t] %logger - %X{code} %msg %n"

  appenders:

    # Log to stdout, use this if running in Docker
  - type: console
    # Multi-coloured log format for console output
    logFormat: "%highlight(%-6level) [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%green(%t)] %cyan(%logger) - %X{code} %msg %n"
    timeZone: UTC

    # Minute rolled files for stroom/datafeed, will be curl'd/deleted by stroom-log-sender
  - type: file
    currentLogFilename: logs/app/app.log
    discardingThreshold: 0
    archivedLogFilenamePattern: logs/app/app-%d{yyyy-MM-dd'T'HH:mm}.log.gz
    # One week using minute files
    archivedFileCount: 10080
    logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"

# This section contains the Stroom Proxy configuration properties
# For more information see:
# https://gchq.github.io/stroom-docs/user-guide/properties.html
# jerseyClients are used for making feed status and content sync REST calls
jerseyClients:
  default:
    tls:
      keyStorePath: "certs/client.jks"
      keyStorePassword: "password"
      trustStorePath: "certs/ca.jks"
      trustStorePassword: "password"

proxyConfig:
  path:
    # By default all files read or written to by stroom-proxy will be in directories relative to
    # the home location. This must be set to an absolute path and also to one that differs
    # the installed software as it has a different lifecycle.
    home: "/stroomdata/stroom-proxy/home"
  # This is the downstream (in datafeed flow terms) stroom/stroom-proxy used for
  # feed status checks, supplying data receipt rules and verifying API keys.
  downstreamHost:
    scheme: "https"
    port: "443"
    hostname: "stroom.some.domain"
    apiKey: "...API KEY..."

  aggregator:
    maxItemsPerAggregate: 1000
    maxUncompressedByteSize: "1G"
    aggregationFrequency: 10m

  forwardFileDestinations:
  - name: "archive-repo"
    path: "/stroomdata/stroom-proxy/archive-repo"
    subPathTemplate:
      pathTemplate: "${year}/${year}-${month}/${year}-${month}-${day}/${year}-${month}-${day}-${feed}/"

  forwardHttpDestinations:
  - name: "downstream-stroom"
    httpClient:
      tls:
        keyStorePath: "certs/client.jks"
        keyStorePassword: "password"
        trustStorePath: "certs/ca.jks"
        trustStorePassword: "password"

  receive:
    receiptCheckMode: "RECEIPT_POLICY"

Air-Gapped Store Only

This is an example of a Stroom-Proxy instance that is hosted in an environment where is has no direct link to a downstream Stroom/Stroom-Proxy. All data is aggregated and forwarded to the local file system for transport downstream using other means outside of the scope of this documentation.

server:
  # ... Same as configuration above

logging:
  # ... Same as configuration above

jerseyClients:
  # ... Same as configuration above

proxyConfig:
  path:
    # By default all files read or written to by stroom-proxy will be in directories relative to
    # the home location. This must be set to an absolute path and also to one that differs
    # the installed software as it has a different lifecycle.
    home: "/stroomdata/stroom-proxy/home"

  # No downstreamHost due to air-gap
  downstreamHost:
    enabled: false

  aggregator:
    maxItemsPerAggregate: 1000
    maxUncompressedByteSize: "1G"
    aggregationFrequency: 10m

  forwardFileDestinations:

  # Repo for a local archive
  - name: "archive-repo"
    path: "/stroomdata/stroom-proxy/archive-repo"
    subPathTemplate:
      pathTemplate: "${year}/${year}-${month}/${year}-${month}-${day}/${year}-${month}-${day}-${feed}/"

  # Repo to be transported downstream around air-gap
  - name: "downstream-repo"
    path: "/stroomdata/stroom-proxy/downstream-repo"
    subPathTemplate:
      pathTemplate: "${year}/${year}-${month}/${year}-${month}-${day}/${year}-${month}-${day}-${feed}/"

  forwardHttpDestinations: []

  receive:
    # No receipt checking due to air-gap. All data accepted.
    receiptCheckMode: "RECEIVE_ALL"

Stroom and Stroom-Proxy Configuration

1 - Common Configuration

Config File Structure

appConfig Section

Variable Substitution

Typed Values

Server configuration

Common Application Configuration

Receive Configuration

Cache Configuration

Warning

Open ID Configuration

Jersey HTTP Client Configuration

Note

Note

Note

Logging Configuration

Request Log

Logback Logs

Loggers

Appenders

Log Rolling

Warning

2 - Stroom Configuration

See Also

General configuration

config.yml

Key Configuration Properties

Deploying without Docker

scripts.env

As part of a docker stack

Ansible

Configuration Reference

activity

analytics

annotation

askStroomAi

autoContentCreation

byteBufferPool

cluster

clusterLock

commonDbDetails

contentPackImport

contentStore

credentials

crossModule

dashboard

data

docstore

elastic

explorer

export

feed

gitRepo

index

job

kafka

lifecycle

lmdbLibrary

logging

node

nodeUri

path

pipeline

planb

processor

properties

publicUri

queryDataSource

queryHistory

receiptPolicy

receive

s3

search

security

session

sessionCookie

solr

state

statistics

`appConfig` Section

`activity`

`analytics`

`annotation`

`askStroomAi`

`autoContentCreation`

`byteBufferPool`

`cluster`

`clusterLock`

`commonDbDetails`

`contentPackImport`

`contentStore`

`credentials`

`crossModule`

`dashboard`

`data`

`docstore`

`elastic`

`explorer`

`export`

`feed`

`gitRepo`

`index`

`job`

`kafka`

`lifecycle`

`lmdbLibrary`

`logging`

`node`

`nodeUri`

`path`

`pipeline`

`planb`

`processor`

`properties`

`publicUri`

`queryDataSource`

`queryHistory`

`receiptPolicy`

`receive`

`s3`

`search`

`security`

`session`

`sessionCookie`

`solr`

`state`

`statistics`

`ui`

`uiUri`

`volumes`