This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Configuration

Stroom and its associated services can be deployed in may ways (single node docker stack, non-docker cluster, kubernetes, etc). This document will cover two types of deployment:

  • Single node stroom_core docker stack.
  • A mixed deployment with nginx in docker and stroom, stroom-proxy and the database not in docker.

This document will explain how each application/service is configured and where its configuration files live.

Application Configuration

The following sections provide links to how to configure each application.

General configuration of docker stacks

Environment variables

The stroom docker stacks have a single env file <stack name>.env that acts as a single point to configure some aspects of the stack. Setting values in the env file can be useful when the value is shared between multiple containers. This env file sets environment variables that are then used for variable substitution in the docker compose YAML files, e.g.

    environment:
      - MYSQL_ROOT_PASSWORD=${STROOM_DB_ROOT_PASSWORD:-my-secret-pw}

In this example the environment variable STROOM_DB_ROOT_PASSWORD is read and used to set the environment variable MYSQL_ROOT_PASSWORD in the docker container. If STROOM_DB_ROOT_PASSWORD is not set then the value my-secret-pw is used instead.

The environment variables set in the env file are NOT automatically visible inside the containers. Only those environment variables defined in the environment section of the docker-compose YAML files are visible. These environment entries can either be hard coded values or use environment variables from outside the container. In some case the names in the env file and the names of the environment variables set in the containers are the same, in some they are different.

The environment variables set in the containers can then be used by the application running in each container to set its configuration. For example, stroom’s config.yml file also uses variable substitution, e.g.

appConfig:
  commonDbDetails:
    connection:
    jdbcDriverClassName: "${STROOM_JDBC_DRIVER_CLASS_NAME:-com.mysql.cj.jdbc.Driver}"

In this example jdbcDriverUrl will be set to the value of environment variable STROOM_JDBC_DRIVER_CLASS_NAME or com.mysql.cj.jdbc.Driver if that is not set.

The following example shows how setting MY_ENV_VAR=123 means myProperty will ultimately get a value of 123 and not its default of 789.

env file (stroom<stack name>.env) - MY_ENV_VAR=123
                |
                |
                | environment variable substitution
                |
                v
docker compose YAML (01_stroom.yml) - STROOM_ENV_VAR=${MY_ENV_VAR:-456}
                |
                |
                | environment variable substitution
                |
                v
Stroom configuration file (config.yml) - myProperty: "${STROOM_ENV_VAR:-789}"

Note that environment variables are only set into the container on start. Any changes to the env file will not take effect until the container is (re)started.

Configuration files

The following shows the basic structure of a stack with respect to the location of the configuration files:

── stroom_core_test-vX.Y.Z
   ├── config                [stack env file and docker compose YAML files]
   └── volumes
       └── <service>
           └── conf/config   [service specifc configuration files]

Some aspects of configuration do not lend themselves to environment variable substitution, e.g. deeply nested parts of stroom’s config.yml. In these instances it may be necessary to have static configuration files that have no connection to the env file or only use environment variables for some values.

Bind mounts

Everything in the stack volumes directory is bind-mounted into the named docker container but is mounted read-only to the container. This allows configuration files to be read by the container but not modified.

Typically the bind mounts mount a directory into the container, though in the case of the stroom-all-dbs.cnf file, the file is mounted. The mounts are done using the inode of the file/directory rather than the name, so docker will mount whatever the inode points to even if the name changes. If for instance the stroom-all-dbs.cnf file is renamed to stroom-all-dbs.cnf.old then copied to stroom-all-dbs.cnf and then the new version modified, the container would still see the old file.

Docker managed volumes

When stroom is running various forms of data are persisted, e.g. stroom’s stream store, stroom-all-dbs database files, etc. All this data is stored in docker managed volumes. By default these will be located in /var/lib/docker/volumes/<volume name>/_data and root/sudo access will be needed to access these directories.

Docker data root

IMPORTANT

By default Docker stores all its images, container layers and managed volumes in its default data root directory which defaults to /var/lib/docker. It is typical in server deployments for the root file system to be kept fairly small and this is likely to result in the root file system running out of space due to the growth in docker images/layers/volumes in /var/lib/docker. It is therefore strongly recommended to move the docker data root to another location with more space.

There are various options for achieving this. In all cases the docker daemon should be stopped prior to making the changes, e.g. service docker stop, then started afterwards.

  • Symlink - One option is to move the var/lib/docker directory to a new location then create a symlink to it. For example:

    ln -s /large_mount/docker_data_root /var/lib/docker

    This has the advantage that anyone unaware that the data root has moved will be able to easily find it if they look in the default location.

  • Configuration - The location can be changed by adding this key to the file /etc/docker/daemon.json (or creating this file if it doesn’t exist.

    {
      "data-root": "/mnt/docker"
    }
    
  • Mount - If your intention is to use a whole storage device for the docker data root then you can mount that device to /var/lib/docker. You will need to make a copy of the /var/lib/docker directory prior to doing this then copy it mount once created. The process for setting up this mount will be OS dependent and is outside the scope of this document.

Active services

Each stroom docker stack comes pre-built with a number of different services, e.g. the stroom_core stack contains the following:

  • stroom
  • stroom-proxy-local
  • stroom-all-dbs
  • nginx
  • stroom-log-sender

While you can pass a set of service names to the commands like start.sh and stop.sh, it may sometimes be required to configure the stack instance to only have a set of services active. You can set the active services like so:

./set_services.sh stroom stroom-all-dbs nginx

In the above example and subsequent use of commands like start.sh and stop.sh with no named services would only act upon the active services set by set_services.sh. This list of active services is held in ACTIVE_SERVICES.txt and the full list of available services is held in ALL_SERVICES.txt.

Certificates

A number of the services in the docker stacks will make use of SSL certificates/keys in various forms. The certificate/key files are typically found in the directories volumes/<service>/certs/.

The stacks come with a set of client/server certificates that can be used for demo/test purposes. For production deployments these should be replaced with the actual certificates/keys for your environment.

In general the best approach to configuring the certificates/keys is to replace the existing files with symlinks to the actual files. For example in the case of the server certificates for nginx (found in volumes/nginx/certs/) the directory would look like:

ca.pem.crt -> /some/path/to/certificate_authority.pem.crt
server.pem.crt -> /some/path/to/host123.pem.crt
server.unencrypted.key -> /some/path/to/host123.key

This approach avoids the need to change any configuration files to reference differently named certificate/key files and avoids having to copy your real certificates/keys into multiple places.

For examples of how to create certificates, keys and keystores see creatCerts.sh

1 - Stroom and Stroom-Proxy Configuration

How to configure Stroom and Stroom-Proxy.

The Stroom and Stroom-Proxy applications are built on the same Dropwizard framework so have a lot of similarities when it comes to configuration.

The Stroom/Stroom-Proxy applications are essentially just an executable JAR file that can be run when provided with a configuration file, config.yml. This config file is common to all forms of deployment.

1.1 - Common Configuration

Configuration common to Stroom and Stroom-Proxy.

config.yml

This YAML file, sometimes known as the Dropwizard configuration file (as it conforms to a structure defined by Dropwizard) is the primary means of configuring Stroom/Stroom-Proxy. As a minimum this file should be used to configure anything that needs to be set before stroom can start up, e.g. web server, logging, database connection details, etc. It is also used to configure anything that is specific to a node in a stroom cluster.

If you are using some form of scripted deployment, e.g. ansible then it can be used to set all stroom properties for the environment that stroom runs in. If you are not using scripted deployments then you can maintain stroom’s node agnostic configuration properties via the user interface.

Config File Structure

This file contains both the Dropwizard configuration settings (settings for ports, paths and application logging) and the Stroom/Stroom-Proxy application specific properties configuration. The file is in YAML format and the application properties are located under the appConfig key. For details of the Dropwizard configuration structure, see here .

The file is split into sections using these keys:

  • server - Configuration of the web server, e.g. ports, paths, request logging.
  • logging - Configuration of application logging
  • jerseyClients - Configuration of the various Jersey HTTP clients in use. See Jersey HTTP Client Configuration.
  • Application specific configuration:
    • appConfig - The Stroom configuration properties. These properties can be viewed/modified in the user interface.
    • proxyConfig - The Stroom-Proxy configuration properties. These properties can be viewed/modified in the user interface.

The following is an example of the YAML configuration file for Stroom:

# Dropwizard configuration section
server:
  # e.g. ports and paths
logging:
  # e.g. logging levels/appenders

jerseyClients:
  DEFAULT:
    # Configuration of the named client

# Stroom properties configuration section
appConfig:
  commonDbDetails:
    connection:
      jdbcDriverClassName: ${STROOM_JDBC_DRIVER_CLASS_NAME:-com.mysql.cj.jdbc.Driver}
      jdbcDriverUrl: ${STROOM_JDBC_DRIVER_URL:-jdbc:mysql://localhost:3307/stroom?useUnicode=yes&characterEncoding=UTF-8}
      jdbcDriverUsername: ${STROOM_JDBC_DRIVER_USERNAME:-stroomuser}
      jdbcDriverPassword: ${STROOM_JDBC_DRIVER_PASSWORD:-stroompassword1}
  contentPackImport:
    enabled: true
  ...

The following is an example of the YAML configuration file for Stroom-Proxy:

# Dropwizard configuration section
server:
  # e.g. ports and paths
logging:
  # e.g. logging levels/appenders

jerseyClients:
  DEFAULT:
    # Configuration of the named client

# Stroom properties configuration section
proxyConfig:
  path:
    home: /some/path
  ...

appConfig Section

The appConfig section is special as it maps to the Properties seen in the Stroom user interface so values can be managed in the file or via the Properties screen in the Stroom UI. The other sections of the file can only be managed via the YAML file. In the Stroom user interface, properties are named with a dot notation key, e.g. stroom.contentPackImport.enabled. Each part of the dot notation property name represents a key in the YAML file, e.g. for this example, the location in the YAML would be:

appConfig:
  contentPackImport:
    enabled: true   # stroom.contentPackImport.enabled

The stroom part of the dot notation name is replaced with appConfig.

For more details on the link between this YAML file and Stroom Properties, see Properties

Variable Substitution

The YAML configuration file supports Bash style variable substitution in the form of:

${ENV_VAR_NAME:-value_if_not_set}

This allows values to be set either directly in the file or via an environment variable, e.g.

      jdbcDriverClassName: ${STROOM_JDBC_DRIVER_CLASS_NAME:-com.mysql.cj.jdbc.Driver}

In the above example, if the STROOM_JDBC_DRIVER_CLASS_NAME environment variable is not set then the value com.mysql.cj.jdbc.Driver will be used instead.

Typed Values

YAML supports typed values rather than just strings, see https://yaml.org/refcard.html. YAML understands booleans, strings, integers, floating point numbers, as well as sequences/lists and maps. Some properties will be represented differently in the user interface to the YAML file. This is due to how values are stored in the database and how the current user interface works. This will likely be improved in future versions. For details of how different types are represented in the YAML and the UI, see Data Types.

Server configuration

The server section controls the configuration of the Jetty web server.

For full details of how to configure the server section see:

The following is an example of the configuration for an application listening on HTTP.

server:
  # The base path for the main application
  applicationContextPath: "/"
  # The base path for the admin pages/API
  adminContextPath: "/stroomAdmin"

  # The scheme/port for the main application
  applicationConnectors:
    - type: http
      port: 8080
      # Uses X-Forwarded-*** headers in request log instead of proxy server details.
      useForwardedHeaders: true
  # The scheme/port for the admin pages/API
  adminConnectors:
    - type: http
      port: 8081
      useForwardedHeaders: true

Jersey HTTP Client Configuration

Stroom and Stroom Proxy use the Jersey client for making HTTP connections with other nodes or other systems (e.g. Open ID Connect identity providers). In the YAML file, the jerseyClients key controls the configuration of the various clients in use.

To allow complete control of the client configuration, Stroom uses the concept of named client configurations. Each named client will be unique to a destination (where a destination is typically a server or a cluster of functionally identical servers). Thus the configuration of the connections to each of those destinations can be configured independently.

The client names are as follows:

  • AWS_PUBLIC_KEYS - Connections to fetch AWS public keys used in Open ID Connect authentication.
  • CONTENT_SYNC - Connections to downstream proxy/stroom instances to sync content. (Stroom Proxy only).
  • DEFAULT - The default client configuration used if a named configuration is not present.
  • FEED_STATUS - Connections to downstream proxy/stroom instances to check feed status. (Stroom Proxy only).
  • OPEN_ID - Connections to an Open ID Connect identity provider, e.g. Cognito, Azure AD, KeyCloak, etc.
  • STROOM - Inter-node communications within the Stroom cluster (Stroom only).

The following is an example of how the clients are configured in the YAML file:

jerseyClients:
  DEFAULT:
    # Default client configuration, e.g.
    timeout: 500ms
  STROOM:
    # Configuration items for stroom inter-node communications
    timeout: 30s
  # etc.

The configuration keys (along with their default values and descriptions) for each client can be found here:

The following is another example including most keys:

jerseyClients:
  DEFAULT:
    minThreads: 1
    maxThreads: 128
    workQueueSize: 8
    gzipEnabled: true
    gzipEnabledForRequests: true
    chunkedEncodingEnabled: true
    timeout: 500ms
    connectionTimeout: 500ms
    timeToLive: 1h
    cookiesEnabled: false
    maxConnections: 1024
    maxConnectionsPerRoute: 1024
    keepAlive: 0ms
    retries: 0
    userAgent: <application name> (<client name>)
    proxy:
      host: 192.168.52.11
      port: 8080
      scheme : http
      auth:
        username: secret
        password: stuff
        authScheme: NTLM
        realm: realm
        hostname: host
        domain: WINDOWSDOMAIN
        credentialType: NT
      nonProxyHosts:
        - localhost
        - '192.168.52.*'
        - '*.example.com'
    tls:
      protocol: TLSv1.2
      provider: SunJSSE
      verifyHostname: true
      keyStorePath: /path/to/file
      keyStorePassword: changeit
      keyStoreType: JKS
      trustStorePath: /path/to/file
      trustStorePassword: changeit
      trustStoreType: JKS
      trustSelfSignedCertificates: false
      supportedProtocols: TLSv1.1,TLSv1.2
      supportedCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
      certAlias: alias-of-specific-cert

Logging Configuration

The Dropwizard configuration file controls all the logging by the application. In addition to the main application log, there are additional logs such as stroom user events (for audit), Stroom-Proxy send and receive logs and database migration logs.

For full details of the logging configuration, see Dropwizard Logging Configuration

Request Log

The request log is slightly different to the other logs. It logs all requests to the web server. It is configured in the server section.

The property archivedLogFilenamePattern controls rolling of the active log file. The date pattern in the filename controls the frequency that the log files are rolled. In this example, files will be rolled every 1 minute.

server:
  requestLog:
    appenders:
    - type: file
      currentLogFilename: logs/access/access.log
      discardingThreshold: 0
      # Rolled and gzipped every minute
      archivedLogFilenamePattern: logs/access/access-%d{yyyy-MM-dd'T'HH:mm}.log.gz
      archivedFileCount: 10080
      logFormat: '%h %l "%u" [%t] "%r" %s %b "%i{Referer}" "%i{User-Agent}" %D'

Logback Logs

Dropwizard uses Logback for application level logging. All logs in Stroom and Stroom-Proxy apart from the request log are Logback based logs.

Logback uses the concept of Loggers and Appenders. A Logger is a named thing that that produces log messages. An _Appender is an output that a Logger can append its log messages to. Typical Appenders are:

  • File - appends messages to a file that may or may not be rolled.
  • Console - appends messages to stdout.
  • Syslog - appends messages to syslog.
Loggers

A Logger can append to more than one Appender if required. For example, the default configuration file for Stroom has two appenders for the application logs. The rolled files from one appender are POSTed to Stroom to index its own logs, then deleted and the other is intended to remain on the server until archived off to allow viewing by an administrator.

A Logger can be configured with a severity, valid severities are (TRACE, DEBUG, WARN, ERROR). The severity set on a logger means that only messages with that severity or higher will be logged, with the rest not logged.

Logger names are typically the name of the Java class that is producing the log message. You don’t need to understand too much about Java classes as you are only likely to change logger severities when requested by one of the developers. Some loggers, such as event-logger do not have a Java class name.

As an example this is a portion of a Stroom config.yml file to illustrate the different loggers/appenders:

logging:
  # This is root logging severity level for all loggers. Only messages >= to WARN will be logged unless overridden
  # for a specific logger
  level: WARN

  # All the named loggers
  loggers:
    # Logs useful information about stroom. Only set DEBUG on specific 'stroom' classes or packages
    # due to the large volume of logs that would be produced for all of 'stroom' in DEBUG.
    stroom: INFO
    # Logs useful information about dropwizard when booting stroom
    io.dropwizard: INFO
    # Logs useful information about the jetty server when booting stroom
    org.eclipse.jetty: INFO
    # Logs REST request/responses with headers/payloads. Set this to OFF to turn disable that logging.
    org.glassfish.jersey.logging.LoggingFeature: INFO
    # Logs summary information about FlyWay database migrations
    org.flywaydb: INFO
    # Logger and custom appender for audit logs
    event-logger:
      level: INFO
      # Prevents messages from this logger from being sent to other appenders
      additive: false
      appenders:
        - type: file
          currentLogFilename: logs/user/user.log
          discardingThreshold: 0
          # Rolled every minute
          archivedLogFilenamePattern: logs/user/user-%d{yyyy-MM-dd'T'HH:mm}.log
          # Minute rolled logs older than a week will be deleted. Note rolled logs are deleted
          # based on the age of the window they contain, not the number of them. This value should be greater
          # than the maximum time stroom is not producing events for.
          archivedFileCount: 10080
          logFormat: "%msg%n"
    # Logger and custom appender for the flyway DB migration SQL output
    org.flywaydb.core.internal.sqlscript:
      level: DEBUG
      additive: false
      appenders:
        - type: file
          currentLogFilename: logs/migration/migration.log
          discardingThreshold: 0
          # Rolled every day
          archivedLogFilenamePattern: logs/migration/migration-%d{yyyy-MM-dd}.log
          archivedFileCount: 10
          logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
Appenders

The following is an example of the default appenders that will be used for all loggers unless they have their own custom appender configured.

logging:
  # Appenders for all loggers except for where a logger has a custom appender configured
  appenders:

    # stdout
  - type: console
    # Multi-coloured log format for console output
    logFormat: "%highlight(%-6level) [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%green(%t)] %cyan(%logger) - %X{code} %msg %n"
    timeZone: UTC
#
    # Minute rolled files for stroom/datafeed, will be curl'd/deleted by stroom-log-sender
  - type: file
    currentLogFilename: logs/app/app.log
    discardingThreshold: 0
    # Rolled and gzipped every minute
    archivedLogFilenamePattern: logs/app/app-%d{yyyy-MM-dd'T'HH:mm}.log.gz
    # One week using minute files
    archivedFileCount: 10080
    logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
Log Rolling

Rolling of log files can be done based on size of file or time. The archivedLogFilenamePattern property controls the rolling behaviour. The rolling policy is determined from the filename pattern, e.g a pattern with a minute precision date format will be rolled every minute. The following is an example of an appender that rolls based on the size of the log file:

  - type: file
    currentLogFilename: logs/app.log
    # The name pattern, where i a sequential number indicating age, where 1 is the most recent
    archivedLogFilenamePattern: logs/app-%i.log
    # The maximum number of rolled files to keep
    archivedFileCount: 10
    # The maximum size of a log file
    maxFileSize: "100MB"
    logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"

The following is an example of an appender that rolls every minute to gzipped files:

  - type: file
    currentLogFilename: logs/app/app.log
    # Rolled and gzipped every minute
    archivedLogFilenamePattern: logs/app/app-%d{yyyy-MM-dd'T'HH:mm}.log.gz
    # One week using minute files
    archivedFileCount: 10080
    logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"

1.2 - Stroom Configuration

Describes how the Stroom application is configured.

General configuration

The Stroom application is essentially just an executable JAR file that can be run when provided with a configuration file, config.yml. This config file is common to all forms of deployment.

config.yml

Stroom operates on a configuration by exception basis so all configuration properties will have a sensible default value and a property only needs to be explicitly configured if the default value is not appropriate, e.g. for tuning a large scale production deployment or where values are environment specific. As a result config.yml only contains a minimal set of properties. The full tree of properties can be seen in ./config/config-defaults.yml and a schema for the configuration tree (along with descriptions for each property) can be found in ./config/config-schema.yml. These two files can be used as a reference when configuring stroom.

Key Configuration Properties

The following are key properties that would typically be changed for a production deployment. All configuration branches are relative to the appConfig root.

The database name(s), hostname(s), port(s), usernames(s) and password(s) should be configured using these properties. Typically stroom is configured to keep it statistics data in a separate database to the main stroom database, as is configured below.

  commonDbDetails:
    connection:
      jdbcDriverUrl: "jdbc:mysql://localhost:3307/stroom?useUnicode=yes&characterEncoding=UTF-8"
      jdbcDriverUsername: "stroomuser"
      jdbcDriverPassword: "stroompassword1"
  statistics:
    sql:
      db:
        connection:
          jdbcDriverUrl: "jdbc:mysql://localhost:3307/stats?useUnicode=yes&characterEncoding=UTF-8"
          jdbcDriverUsername: "statsuser"
          jdbcDriverPassword: "stroompassword1"

In a clustered deployment each node must be given a node name that is unique within the cluster. This is used to identify nodes in the Nodes screen. It could be the hostname of the node or follow some other naming convetion.

  node:
    name: "node1a"

Each node should have its identity on the network configured so that it uses the appropriate FQDNs. The nodeUri hostname is the FQDN of each node and used by nodes to communicate with each other, therefore it can be private to the cluster of nodes. The publicUri hostname is the public facing FQDN for stroom, i.e. the address of a load balancer or Nginx. This is the address that users will use in their browser.

  nodeUri:
    hostname: "localhost" # e.g. node5.stroomnodes.somedomain
  publicUri:
    hostname: "localhost" # e.g. stroom.somedomain

Deploying without Docker

Stroom running without docker has two files to configure it. The following locations are relative to the stroom home directory, i.e. the root of the distribution zip.

  • ./config/config.yml - Stroom configuration YAML file
  • ./config/scripts.env - Stroom scripts configuration env file

The distribution also includes these files which are helpful when it comes to configuring stroom.

  • ./config/config-defaults.yml - Full version of the config.yml file containing all branches/leaves with default values set. Useful as a reference for the structure and the default values.
  • ./config/config-schema.yml - The schema defining the structure of the config.yml file.

scripts.env

This file is used by the various shell scripts like start.sh, stop.sh, etc. This file should not need to be unless you want to change the locations where certain log files are written to or need to change the java memory settings.

In a production system it is highly likely that you will need to increase the java heap size as the default is only 2G. The heap size settings and any other java command line options can be set by changing:

JAVA_OPTS="-Xms512m -Xmx2048m"

As part of a docker stack

When stroom is run as part of one of our docker stacks, e.g. stroom_core there are some additional layers of configuration to take into account, but the configuration is still primarily done using the config.yml file.

Stroom’s config.yml file is found in the stack in ./volumes/stroom/config/ and this is the primary means of configuring Stroom.

The stack also ships with a default config.yml file baked into the docker image. This minimal fallback file (located in /stroom/config-fallback/ inside the container) will be used in the absence of one provided in the docker stack configuration (./volumes/stroom/config/).

The default config.yml file uses environment variable substitution so some configuration items will be set by environment variables set into the container by the stack env file and the docker-compose YAML. This approach is useful for configuration values that need to be used by multiple containers, e.g. the public FQDN of Nginx, so it can be configured in one place.

If you need to further customise the stroom configuration then it is recommended to edit the ./volumes/stroom/config/config.yml file. This can either be a simple file with hard coded values or one that uses environment variables for some of its configuration items.

The configuration works as follows:

env file (stroom<stack name>.env)
                |
                |
                | environment variable substitution
                |
                v
docker compose YAML (01_stroom.yml)
                |
                |
                | environment variable substitution
                |
                v
Stroom configuration file (config.yml)

Ansible

If you are using Ansible to deploy a stack then it is recommended that all of stroom’s configuration properties are set directly in the config.yml file using a templated version of the file and to NOT use any environment variable substitution. When using Ansible, the Ansible inventory is the single source of truth for your configuration so not using environment variable substitution for stroom simplifies the configuration and makes it clearer when looking at deployed configuration files.

Stroom-ansible has an example inventory for a single node stroom stack deployment. The group_vars/all file shows how values can be set into the env file.

1.3 - Stroom Proxy Configuration

Describes how the Stroom-Proxy application is configured.

The configuration of Stroom-proxy is very much the same as for Stroom with the only difference being the structure of the application specific part of the config.yml file. Stroom-proxy has a proxyConfig key in the YAML while Stroom has appConfig.

General configuration

The Stroom-proxy application is essentially just an executable JAR file that can be run when provided with a configuration file, config.yml. This configuration file is common to all forms of deployment.

config.yml

Stroom-proxy does not have a user interface so the config.yml file is the only way of configuring stroom-proxy. As with stroom, the config.yml file is split into three sections using these keys:

  • server - Configuration of the web server, e.g. ports, paths, request logging.
  • logging - Configuration of application logging
  • proxyConfig - Stroom-Proxy specific configuration

See also Properties for more details on structure of the config.yml file and supported data types.

Stroom-Proxy operates on a configuration by exception basis so all configuration properties will have a sensible default value and a property only needs to be explicitly configured if the default value is not appropriate, e.g. for tuning a large scale production deployment or where values are environment specific. As a result config.yml only contains a minimal set of properties. The full tree of properties can be seen in ./config/config-defaults.yml and a schema for the configuration tree (along with descriptions for each property) can be found in ./config/config-schema.yml. These two files can be used as a reference when configuring stroom.

Key Configuration Properties

Stroom-proxy has two main functions, storing and forwarding. It can be configured to do either or both of these functions. These functions are enabled/disabled using:

proxyConfig:

  # The list of named destinations that Stroom-Proxy will forward to
  forwardHttpDestinations:
    - enabled: true
      name: "downstream"
      forwardUrl: "https://some-host/stroom/datafeed"

  # Whether to store received data in a repository
  repository:
    storingEnabled: true

  # If we are storing data in a proxy repository we can aggregate it before forwarding.
  aggregator:
    maxItemsPerAggregate: 1000
    maxUncompressedByteSize: "1G"
    maxAggregateAge: 10m
    aggregationFrequency: 1m

Stroom-proxy should be configured to check the receipt status of feeds on receipt of data. This is done by configuring the end point of a downstream stroom-proxy or stroom.

  feedStatus:
    url: "http://stroom:8080/api/feedStatus/v1"
    apiKey: ""

The url should be the url for the feed status API on the downstream stroom(-proxy). If this is on the same host then you can use the http endpoint, however if it is on a remote host then you should use https and the host of its nginx, e.g. https://downstream-instance/api/feedStatus/v1.

In order to use the API, the proxy must have a configured apiKey. The API key must be created in the downstream stroom instance and then copied into this configuration.

If the proxy is configured to forward data then the forward destination(s) should be set. This is the datafeed endpoint of the downstream stroom-proxy or stroom instance that data will be forwarded to. This may also be te address of a load balancer or similar that is fronting a cluster of stroom-proxy or stroom instances. See also Feed status certificate configuration.

  forwardHttpDestinations:
    - enabled: true
      name: "downstream"
      forwardUrl: "https://some-host/stroom/datafeed"

forwardUrl specifies the URL of the datafeed endpoint on the destination host. Each forward location can use a different key/trust store pair. See also Forwarding certificate configuration.

If the proxy is configured to store then it is the location of the proxy repository may need to be configured if it needs to be in a different location to the proxy home directory, e.g. on another mount point.

Deploying without Docker

Apart from the structure of the config.yml file, the configuration in a non-docker environment is the same as for stroom

As part of a docker stack

The way stroom-proxy is configured is essentially the same as for stroom with the only real difference being the structure of the config.yml file as note above . As with stroom the docker stack comes with a ./volumes/stroom-proxy-*/config/config.yml file that will be used in the absence of a provided one. Also as with stroom, the config.yml file supports environment variable substitution so can make use of environment variables set in the stack env file and passed down via the docker-compose YAML files.

Certificates

Stroom-proxy makes use of client certificates for two purposes:

  • Communicating with a downstream stroom/stroom-proxy in order to establish the receipt status for the feeds it has received data for.
  • When forwarding data to a downstream stroom/stroom-proxy

The stack comes with the following files that can be used for demo/test purposes.

volumes/stroom-proxy-*/certs/ca.jks
volumes/stroom-proxy-*/certs/client.jks

For a production deployment these will need to be changed, see Certificates

Feed status certificate configuration

The configuration of the client certificates for feed status checks is done using the FEED_STATUS jersey client configuration. See Stroom and Stroom-Proxy Common Configuration.

Forwarding certificate configuration

Stroom-proxy can forward to multiple locations. The configuration of the certificate(s) for the forwarding locations is as follows:

proxyConfig:

  forwardHttpDestinations:
    - enabled: true
      name: "downstream"
      forwardUrl: "https://some-host/stroom/datafeed"
      sslConfig:
        keyStorePath: "/stroom-proxy/certs/client.jks"
        keyStorePassword: "password"
        keyStoreType: "JKS"
        trustStorePath: "/stroom-proxy/certs/ca.jks"
        trustStorePassword: "password"
        trustStoreType: "JKS"
        hostnameVerificationEnabled: true

forwardUrl specifies the URL of the datafeed endpoint on the destination host. Each forward location can use a different key/trust store pair.

2 - Nginx Configuration

Configuring Nginx for use with Stroom and Stroom Proxy.

Nginx is the standard web server used by stroom. Its primary role is SSL termination and reverse proxying for stroom and stroom-proxy that sit behind it. It can also load balance incoming requests and ensure traffic from the same source is always route to the same upstream instance. Other web servers can be used if required but their installation/configuration is out of the scope of this documentation.

Without Docker

The standard way of deploying Nginx with stroom running without docker involves running Nginx as part of the services stack. See below for details of how to configure it. If you want to deploy Nginx without docker then you can but that is outside the scope of the this documentation.

As part of a docker stack

Nginx is included in all the stroom docker stacks. Nginx is configured using multiple configuration files to aid clarity and allow reuse of sections of configuration. The main file for configuring Nginx is nginx.conf.template and this makes use of other files via include statements.

The purpose of the various files is as follows:

  • nginx.conf.template - Top level configuration file that orchestrate the other files.
  • logging.conf.template - Configures the logging output, its content and format.
  • server.conf.template - Configures things like SSL settings, timeouts, ports, buffering, etc.
  • Upstream configuration
    • upstreams.stroom.ui.conf.template - Defines the upstream host(s) for stroom node(s) that are dedicated to serving the user interface.
    • upstreams.stroom.processing.conf.template - Defines the upstream host(s) for stroom node(s) that are dedicated to stream processing and direct data receipt.
    • upstreams.proxy.conf.template - Defines the upstream host(s) for local stroom-proxy node(s).
  • Location configuration
    • locations_defaults.conf.template - Defines some default directives (e.g. headers) for configuring stroom paths.
    • proxy_location_defaults.conf.template - Defines some default directives (e.g. headers) for configuring stroom-proxy paths.
    • locations.proxy.conf.template - Defines the various paths (e.g/ /datafeed) that will be reverse proxied to stroom-proxy hosts.
    • locations.stroom.conf.template - Defines the various paths (e.g/ /datafeeddirect) that will be reverse proxied to stroom hosts.

Templating

The nginx container has been configured to support using environment variables passed into it to set values in the Nginx configuration files. It should be noted that recent versions of Nginx have templating support built in. The templating mechanism used in stroom’s Nginx container was set up before this existed but achieves the same result.

All non-default configuration files for Nginx should be placed in volumes/nginx/conf/ and named with the suffix .template (even if no templating is needed). When the container starts any variables in these templates will be substituted and the resulting files will be copied into /etc/nginx. The result of the template substitution is logged to help with debugging.

The files can contain templating of the form:

ssl_certificate             /stroom-nginx/certs/<<<NGINX_SSL_CERTIFICATE>>>;

In this example <<<NGINX_SSL_CERTIFICATE>>> will be replaced with the value of environment variable NGINX_SSL_CERTIFICATE when the container starts.

Upstreams

When configuring a multi node cluster you will need to configure the upstream hosts. Nginx acts as a reverse proxy for the applications behind it so the lists of hosts for each application need to be configured.

For example if you have a 10 node cluster and 2 of those nodes are dedicated for user interface use then the configuration would look like:

upstreams.stroom.ui.conf.template

server node1.stroomhosts:<<<STROOM_PORT>>>
server node2.stroomhosts:<<<STROOM_PORT>>>

upstreams.stroom.processing.conf.template

server node3.stroomhosts:<<<STROOM_PORT>>>
server node4.stroomhosts:<<<STROOM_PORT>>>
server node5.stroomhosts:<<<STROOM_PORT>>>
server node6.stroomhosts:<<<STROOM_PORT>>>
server node7.stroomhosts:<<<STROOM_PORT>>>
server node8.stroomhosts:<<<STROOM_PORT>>>
server node9.stroomhosts:<<<STROOM_PORT>>>
server node10.stroomhosts:<<<STROOM_PORT>>>

upstreams.proxy.conf.template

server node3.stroomhosts:<<<STROOM_PORT>>>
server node4.stroomhosts:<<<STROOM_PORT>>>
server node5.stroomhosts:<<<STROOM_PORT>>>
server node6.stroomhosts:<<<STROOM_PORT>>>
server node7.stroomhosts:<<<STROOM_PORT>>>
server node8.stroomhosts:<<<STROOM_PORT>>>
server node9.stroomhosts:<<<STROOM_PORT>>>
server node10.stroomhosts:<<<STROOM_PORT>>>

In the above example the port is set using templating as it is the same for all nodes. Nodes 1 and 2 will receive all UI and REST API traffic. Nodes 8-10 will serve all datafeed(direct) requests.

Certificates

The stack comes with a default server certificate/key and CA certificate for demo/test purposes. The files are located in volumes/nginx/certs/. For a production deployment these will need to be changed, see Certificates

Log rotation

The Nginx container makes use of logrotate to rotate Nginx’s log files after a period of time so that rotated logs can be sent to stroom. Logrotate is configured using the file volumes/stroom-log-sender/logrotate.conf.template. This file is templated in the same way as the Nginx configuration files, see above. The number of rotated files that should be kept before deleting them can be controlled using the line.

rotate 100

This should be set in conjunction with the frequency that logrotate is called, which is controlled by volumes/stroom-log-sender/crontab.txt. This crontab file drives the lograte process and by default is set to run every minute.

3 - Stroom Log Sender Configuration

Stroom log sender is a docker image used for sending application logs to stroom. It is essentially just a combination of the send_to_stroom.sh script and a set of crontab entries to call the script at intervals.

Deploying without Docker

When deploying without docker stroom and stroom-proxy nodes will need to be configured to send their logs to stroom. This can be done using the ./bin/send_to_stroom.sh script in the stroom and stroom-proxy zip distributions and some crontab configuration.

The crontab file for the user account running stroom should be edited (crontab -e) and set to something like:

# stroom logs
* * * * * STROOM_HOME=<path to stroom home> ${STROOM_HOME}/bin/send_to_stroom.sh ${STROOM_HOME}/logs/access STROOM-ACCESS-EVENTS <datafeed URL> --system STROOM --environment <environment> --file-regex '.*/[a-z]+-[0-9]{4}-[0-9]{2}-[0-9]{2}T.*\\.log' --max-sleep 10 --key <key file> --cert <cert file> --cacert <CA cert file> --delete-after-sending --compress >> <path to log> 2>&1
* * * * * STROOM_HOME=<path to stroom home> ${STROOM_HOME}/bin/send_to_stroom.sh ${STROOM_HOME}/logs/app    STROOM-APP-EVENTS    <datafeed URL> --system STROOM --environment <environment> --file-regex '.*/[a-z]+-[0-9]{4}-[0-9]{2}-[0-9]{2}T.*\\.log' --max-sleep 10 --key <key file> --cert <cert file> --cacert <CA cert file> --delete-after-sending --compress >> <path to log> 2>&1
* * * * * STROOM_HOME=<path to stroom home> ${STROOM_HOME}/bin/send_to_stroom.sh ${STROOM_HOME}/logs/user   STROOM-USER-EVENTS   <datafeed URL> --system STROOM --environment <environment> --file-regex '.*/[a-z]+-[0-9]{4}-[0-9]{2}-[0-9]{2}T.*\\.log' --max-sleep 10 --key <key file> --cert <cert file> --cacert <CA cert file> --delete-after-sending --compress >> <path to log> 2>&1

# stroom-proxy logs
* * * * * PROXY_HOME=<path to proxy home> ${PROXY_HOME}/bin/send_to_stroom.sh ${PROXY_HOME}/logs/access  STROOM_PROXY-ACCESS-EVENTS  <datafeed URL> --system STROOM-PROXY --environment <environment> --file-regex '.*/[a-z]+-[0-9]{4}-[0-9]{2}-[0-9]{2}T.*\\.log' --max-sleep 10 --key <key file> --cert <cert file> --cacert <CA cert file> --delete-after-sending --compress >> <path to log> 2>&1
* * * * * PROXY_HOME=<path to proxy home> ${PROXY_HOME}/bin/send_to_stroom.sh ${PROXY_HOME}/logs/app     STROOM_PROXY-APP-EVENTS     <datafeed URL> --system STROOM-PROXY --environment <environment> --file-regex '.*/[a-z]+-[0-9]{4}-[0-9]{2}-[0-9]{2}T.*\\.log' --max-sleep 10 --key <key file> --cert <cert file> --cacert <CA cert file> --delete-after-sending --compress >> <path to log> 2>&1
* * * * * PROXY_HOME=<path to proxy home> ${PROXY_HOME}/bin/send_to_stroom.sh ${PROXY_HOME}/logs/send    STROOM_PROXY-SEND-EVENTS    <datafeed URL> --system STROOM-PROXY --environment <environment> --file-regex '.*/[a-z]+-[0-9]{4}-[0-9]{2}-[0-9]{2}T.*\\.log' --max-sleep 10 --key <key file> --cert <cert file> --cacert <CA cert file> --delete-after-sending --compress >> <path to log> 2>&1
* * * * * PROXY_HOME=<path to proxy home> ${PROXY_HOME}/bin/send_to_stroom.sh ${PROXY_HOME}/logs/receive STROOM_PROXY-RECEIVE-EVENTS <datafeed URL> --system STROOM-PROXY --environment <environment> --file-regex '.*/[a-z]+-[0-9]{4}-[0-9]{2}-[0-9]{2}T.*\\.log' --max-sleep 10 --key <key file> --cert <cert file> --cacert <CA cert file> --delete-after-sending --compress >> <path to log> 2>&1

where the environment specific values are:

  • <path to stroom home> - The absolute path to the stroom home, i.e. the location of the start.sh script.
  • <path to proxy home> - The absolute path to the stroom-proxy home, i.e. the location of the start.sh script.
  • <datafeed URL> - The URL that the logs will be sent to. This will typically be the nginx host or load balancer and the path will typically be https://host/datafeeddirect to bypass the proxy for faster access to the logs.
  • <environment> - The environment name that the stroom/proxy is deployed in, e.g. OPS, REF, DEV, etc.
  • <key file> - The absolute path to the SSL key file used by curl.
  • <cert file> - The absolute path to the SSL certificate file used by curl.
  • <CA cert file> - The absolute path to the SSL certificate authority file used by curl.
  • <path to log> - The absolute path to a log file to log all the send_to_stroom.sh output to.

If your implementation of cron supports environment variables then you can define some of the common values at the top of the crontab file and use them in the entries. cronie as used by Centos does not support environment variables in the crontab file but variables can be defined at the line level as has been shown with STROOM_HOME and PROXY_HOME.

The above crontab entries assume that stroom and stroom-proxy are running on the same host. If there are not then the entries can be split across the hosts accordingly.

Service host(s)

When deploying stroom/stroom-proxy without stroom you may still be deploying the service stack (nginx and stroom-log-sender) to a host. In this case see As part of a docker stack below for details of how to configure stroom-log-sender to send the nginx logs.

As part of a docker stack

Crontab

The docker stacks include the stroom-log-sender docker image for sending the logs of all the other containers to stroom. Stroom-log-sender is configured using the crontab file volumes/stroom-log-sender/conf/crontab.txt. When the container starts this file will be read. Any variables in it will be substituted with the values from the corresponding environment variables that are present in the container. These common values can be set in the config/<stack name>.env file.

As the variables are substituted on container start you will need to restart the container following any configuration change.

Certificates

The directory volumes/stroom-log-sender/certs contains the default client certificates used for the stack. These allow stroom-log-sender to send the log files over SSL which also provides stroom with details of the sender. These will need to be replaced in a production environment.

volumes/stroom-log-sender/certs/ca.pem.crt
volumes/stroom-log-sender/certs/client.pem.crt
volumes/stroom-log-sender/certs/client.unencrypted.key

For a production deployment these will need to be changed, see Certificates

4 - MySQL Configuration

Confnguring MySQl for use with Stroom.

General configuration

MySQL is configured via the .cnf file which is typically located in one of these locations:

  • /etc/my.cnf
  • /etc/mysql/my.cnf
  • $MYSQL_HOME/my.cnf
  • <data dir>/my.cnf
  • ~/.my.cnf

Key configuration properties

  • lower_case_table_names - This proerty controls how the tables are stored on the filesystem and the case-sensitivity of table names in SQL. A value of 0 means tables are stored on the filesystem in the case used in CREATE TABLE and sql is case sensitive. This is the default in linux and is the preferred value for deployments of stroom of v7+. A value of 1 means tables are stored on the filesystem in lowercase but sql is case insensitive. See also Identifier Case Sensitivity

  • max_connections - The maximum permitted number of simultaneous client connections. For a clustered deployment of stroom, the default value of 151 will typically be too low. Each stroom node will hold a pool of open database connections for its use, therefore with a large number of stroom nodes and a big connection pool the total number of connections can be very large. This property should be set taking into account the values of the stroom properties of the form *.db.connectionPool.maxPoolSize. See also Connection Interfaces

  • innodb_buffer_pool_size/innodb_buffer_pool_instances - Controls the amount of memory availble to MySQL for caching table/index data. Typically this will be set to 80% of available RAM, assuming MySQL is running on a dedicated host and the total amount of table/index data is greater than 80% of avaialable RAM. Note: innodb_buffer_pool_size must be set to a value that is equal to or a multiple of innodb_buffer_pool_chunk_size * innodb_buffer_pool_instances. See also Configuring InnoDB Buffer Pool Size

Deploying without Docker

When MySQL is deployed without a docker stack then MySQL should be installed and configured according to the MySQL documentation. How MySQL is deployed and configured will depend on the requirements of the environment, e.g. clustered, primary/standby, etc.

As part of a docker stack

Where a stroom docker stack includes stroom-all-dbs (MySQL) the MySQL instance is configured via the .cnf file. The .cnf file is located in volumes/stroom-all-dbs/conf/stroom-all-dbs.cnf. This file is read-only to the container and will be read on container start.

Database initialisation

When the container is started for the first time the database be initialised with the root user account. It will also then run any scripts found in volumes/stroom-all-dbs/init/stroom. The scripts in here will be run in alpabetical order. Scripts of the form .sh, .sql, .sql.gz and .sql.template are supported.

.sql.template files are proprietry to stroom stacks and are just templated .sql files. They can contain tags of the form <<<ENV_VAR_NAME>>> which will be replaced with the value of the named environment variable that has been set in the container.

If you need to add additional database users then either add them to volumes/stroom-all-dbs/init/stroom/001_create_databases.sql.template or create additional scripts/templates in that directory.

The script that controls this templating is volumes/stroom-all-dbs/init/000_stroom_init.sh. This script MUST not have its executable bit set else it will be executed rather than being sourced by the MySQL entry point scripts and will then not work.