1 - Common Configuration
Configuration common to Stroom and Stroom-Proxy.
config.yml
This YAML file, sometimes known as the Dropwizard configuration file (as it conforms to a structure defined by Dropwizard) is the primary means of configuring Stroom/Stroom-Proxy.
As a minimum this file should be used to configure anything that needs to be set before stroom can start up, e.g. web server, logging, database connection details, etc.
It is also used to configure anything that is specific to a node in a stroom cluster.
If you are using some form of scripted deployment, e.g. ansible then it can be used to set all stroom properties for the environment that stroom runs in.
If you are not using scripted deployments then you can maintain stroom’s node agnostic configuration properties via the user interface.
Config File Structure
This file contains both the Dropwizard configuration settings (settings for ports, paths and application logging) and the Stroom/Stroom-Proxy application specific properties configuration.
The file is in YAML format and the application properties are located under the appConfig
key.
For details of the Dropwizard configuration structure, see
here
.
The file is split into sections using these keys:
server
- Configuration of the web server, e.g. ports, paths, request logging.
logging
- Configuration of application logging
jerseyClients
- Configuration of the various Jersey HTTP clients in use.
See Jersey HTTP Client Configuration.
- Application specific configuration:
appConfig
- The Stroom configuration properties.
These properties can be viewed/modified in the user interface.
proxyConfig
- The Stroom-Proxy configuration properties.
These properties can be viewed/modified in the user interface.
The following is an example of the YAML configuration file for Stroom:
# Dropwizard configuration section
server:
# e.g. ports and paths
logging:
# e.g. logging levels/appenders
jerseyClients:
DEFAULT:
# Configuration of the named client
# Stroom properties configuration section
appConfig:
commonDbDetails:
connection:
jdbcDriverClassName: ${STROOM_JDBC_DRIVER_CLASS_NAME:-com.mysql.cj.jdbc.Driver}
jdbcDriverUrl: ${STROOM_JDBC_DRIVER_URL:-jdbc:mysql://localhost:3307/stroom?useUnicode=yes&characterEncoding=UTF-8}
jdbcDriverUsername: ${STROOM_JDBC_DRIVER_USERNAME:-stroomuser}
jdbcDriverPassword: ${STROOM_JDBC_DRIVER_PASSWORD:-stroompassword1}
contentPackImport:
enabled: true
...
The following is an example of the YAML configuration file for Stroom-Proxy:
# Dropwizard configuration section
server:
# e.g. ports and paths
logging:
# e.g. logging levels/appenders
jerseyClients:
DEFAULT:
# Configuration of the named client
# Stroom properties configuration section
proxyConfig:
path:
home: /some/path
...
appConfig
Section
The appConfig
section is special as it maps to the Properties seen in the Stroom user interface so values can be managed in the file or via the Properties screen in the Stroom UI.
The other sections of the file can only be managed via the YAML file.
In the Stroom user interface, properties are named with a dot notation key, e.g. stroom.contentPackImport.enabled.
Each part of the dot notation property name represents a key in the YAML file, e.g. for this example, the location in the YAML would be:
appConfig:
contentPackImport:
enabled: true # stroom.contentPackImport.enabled
The stroom part of the dot notation name is replaced with appConfig.
For more details on the link between this YAML file and Stroom Properties, see Properties
Variable Substitution
The YAML configuration file supports Bash style variable substitution in the form of:
${ENV_VAR_NAME:-value_if_not_set}
This allows values to be set either directly in the file or via an environment variable, e.g.
jdbcDriverClassName: ${STROOM_JDBC_DRIVER_CLASS_NAME:-com.mysql.cj.jdbc.Driver}
In the above example, if the STROOM_JDBC_DRIVER_CLASS_NAME environment variable is not set then the value com.mysql.cj.jdbc.Driver will be used instead.
Typed Values
YAML supports typed values rather than just strings, see https://yaml.org/refcard.html.
YAML understands booleans, strings, integers, floating point numbers, as well as sequences/lists and maps.
Some properties will be represented differently in the user interface to the YAML file.
This is due to how values are stored in the database and how the current user interface works.
This will likely be improved in future versions.
For details of how different types are represented in the YAML and the UI, see Data Types.
Server configuration
The server
section controls the configuration of the Jetty web server.
For full details of how to configure the server
section see:
The following is an example of the configuration for an application listening on HTTP.
server:
# The base path for the main application
applicationContextPath: "/"
# The base path for the admin pages/API
adminContextPath: "/stroomAdmin"
# The scheme/port for the main application
applicationConnectors:
- type: http
port: 8080
# Uses X-Forwarded-*** headers in request log instead of proxy server details.
useForwardedHeaders: true
# The scheme/port for the admin pages/API
adminConnectors:
- type: http
port: 8081
useForwardedHeaders: true
Jersey HTTP Client Configuration
Stroom and Stroom Proxy use the
Jersey
client for making HTTP connections with other nodes or other systems (e.g. Open ID Connect identity providers).
In the YAML file, the jerseyClients
key controls the configuration of the various clients in use.
To allow complete control of the client configuration, Stroom uses the concept of named client configurations.
Each named client will be unique to a destination (where a destination is typically a server or a cluster of functionally identical servers).
Thus the configuration of the connections to each of those destinations can be configured independently.
The client names are as follows:
AWS_PUBLIC_KEYS
- Connections to fetch AWS public keys used in Open ID Connect authentication.
CONTENT_SYNC
- Connections to downstream proxy/stroom instances to sync content. (Stroom Proxy only).
DEFAULT
- The default client configuration used if a named configuration is not present.
FEED_STATUS
- Connections to downstream proxy/stroom instances to check feed status. (Stroom Proxy only).
OPEN_ID
- Connections to an Open ID Connect identity provider, e.g. Cognito, Azure AD, KeyCloak, etc.
STROOM
- Inter-node communications within the Stroom cluster (Stroom only).
Note
If a named configuration does not exist then the configuration for DEFAULT
will be used.
If DEFAULT
is not defined in the configuration then the Dropwizard defaults will be used.
The following is an example of how the clients are configured in the YAML file:
jerseyClients:
DEFAULT:
# Default client configuration, e.g.
timeout: 500ms
STROOM:
# Configuration items for stroom inter-node communications
timeout: 30s
# etc.
The configuration keys (along with their default values and descriptions) for each client can be found here:
The following is another example including most keys:
jerseyClients:
DEFAULT:
minThreads: 1
maxThreads: 128
workQueueSize: 8
gzipEnabled: true
gzipEnabledForRequests: true
chunkedEncodingEnabled: true
timeout: 500ms
connectionTimeout: 500ms
timeToLive: 1h
cookiesEnabled: false
maxConnections: 1024
maxConnectionsPerRoute: 1024
keepAlive: 0ms
retries: 0
userAgent: <application name> (<client name>)
proxy:
host: 192.168.52.11
port: 8080
scheme : http
auth:
username: secret
password: stuff
authScheme: NTLM
realm: realm
hostname: host
domain: WINDOWSDOMAIN
credentialType: NT
nonProxyHosts:
- localhost
- '192.168.52.*'
- '*.example.com'
tls:
protocol: TLSv1.2
provider: SunJSSE
verifyHostname: true
keyStorePath: /path/to/file
keyStorePassword: changeit
keyStoreType: JKS
trustStorePath: /path/to/file
trustStorePassword: changeit
trustStoreType: JKS
trustSelfSignedCertificates: false
supportedProtocols: TLSv1.1,TLSv1.2
supportedCipherSuites: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
certAlias: alias-of-specific-cert
Note
Duration values in the Jersey client configuration blocks are different to Stroom Durations defined in Stroom properties.
They are defined as a numeric value and a unit suffix.
Typical suffixes are (in ascending order): ns
, us
, ms
, s
, m
, h
, d
.
ISO 8601 duration strings are NOT supported, nor are values without a suffix.
Full list of duration suffixes and their aliases
Note
The paths used for the key and trust stores will be treated in the same way as Stroom property paths, i.e relative to stroom.home
if relative and supporting variable substitution.
Logging Configuration
The Dropwizard configuration file controls all the logging by the application.
In addition to the main application log, there are additional logs such as stroom user events (for audit), Stroom-Proxy send and receive logs and database migration logs.
For full details of the logging configuration, see
Dropwizard Logging Configuration
Request Log
The request log is slightly different to the other logs.
It logs all requests to the web server.
It is configured in the server
section.
The property archivedLogFilenamePattern
controls rolling of the active log file.
The date pattern in the filename controls the frequency that the log files are rolled.
In this example, files will be rolled every 1 minute.
server:
requestLog:
appenders:
- type: file
currentLogFilename: logs/access/access.log
discardingThreshold: 0
# Rolled and gzipped every minute
archivedLogFilenamePattern: logs/access/access-%d{yyyy-MM-dd'T'HH:mm}.log.gz
archivedFileCount: 10080
logFormat: '%h %l "%u" [%t] "%r" %s %b "%i{Referer}" "%i{User-Agent}" %D'
Logback Logs
Dropwizard uses
Logback
for application level logging.
All logs in Stroom and Stroom-Proxy apart from the request log are Logback based logs.
Logback uses the concept of Loggers and Appenders.
A Logger is a named thing that that produces log messages.
An _Appender is an output that a Logger can append its log messages to.
Typical Appenders are:
- File - appends messages to a file that may or may not be rolled.
- Console - appends messages to
stdout
.
- Syslog - appends messages to
syslog
.
Loggers
A Logger can append to more than one Appender if required.
For example, the default configuration file for Stroom has two appenders for the application logs.
The rolled files from one appender are POSTed to Stroom to index its own logs, then deleted and the other is intended to
remain on the server until archived off to allow viewing by an administrator.
A Logger can be configured with a severity, valid severities are (TRACE
, DEBUG
, WARN
, ERROR
).
The severity set on a logger means that only messages with that severity or higher will be logged, with the rest not logged.
Logger names are typically the name of the Java class that is producing the log message.
You don’t need to understand too much about Java classes as you are only likely to change logger severities when requested by one of the developers.
Some loggers, such as event-logger
do not have a Java class name.
As an example this is a portion of a Stroom config.yml file to illustrate the different loggers/appenders:
logging:
# This is root logging severity level for all loggers. Only messages >= to WARN will be logged unless overridden
# for a specific logger
level: WARN
# All the named loggers
loggers:
# Logs useful information about stroom. Only set DEBUG on specific 'stroom' classes or packages
# due to the large volume of logs that would be produced for all of 'stroom' in DEBUG.
stroom: INFO
# Logs useful information about dropwizard when booting stroom
io.dropwizard: INFO
# Logs useful information about the jetty server when booting stroom
org.eclipse.jetty: INFO
# Logs REST request/responses with headers/payloads. Set this to OFF to turn disable that logging.
org.glassfish.jersey.logging.LoggingFeature: INFO
# Logs summary information about FlyWay database migrations
org.flywaydb: INFO
# Logger and custom appender for audit logs
event-logger:
level: INFO
# Prevents messages from this logger from being sent to other appenders
additive: false
appenders:
- type: file
currentLogFilename: logs/user/user.log
discardingThreshold: 0
# Rolled every minute
archivedLogFilenamePattern: logs/user/user-%d{yyyy-MM-dd'T'HH:mm}.log
# Minute rolled logs older than a week will be deleted. Note rolled logs are deleted
# based on the age of the window they contain, not the number of them. This value should be greater
# than the maximum time stroom is not producing events for.
archivedFileCount: 10080
logFormat: "%msg%n"
# Logger and custom appender for the flyway DB migration SQL output
org.flywaydb.core.internal.sqlscript:
level: DEBUG
additive: false
appenders:
- type: file
currentLogFilename: logs/migration/migration.log
discardingThreshold: 0
# Rolled every day
archivedLogFilenamePattern: logs/migration/migration-%d{yyyy-MM-dd}.log
archivedFileCount: 10
logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
Appenders
The following is an example of the default appenders that will be used for all loggers unless they have their own custom appender configured.
logging:
# Appenders for all loggers except for where a logger has a custom appender configured
appenders:
# stdout
- type: console
# Multi-coloured log format for console output
logFormat: "%highlight(%-6level) [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%green(%t)] %cyan(%logger) - %X{code} %msg %n"
timeZone: UTC
#
# Minute rolled files for stroom/datafeed, will be curl'd/deleted by stroom-log-sender
- type: file
currentLogFilename: logs/app/app.log
discardingThreshold: 0
# Rolled and gzipped every minute
archivedLogFilenamePattern: logs/app/app-%d{yyyy-MM-dd'T'HH:mm}.log.gz
# One week using minute files
archivedFileCount: 10080
logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
Log Rolling
Rolling of log files can be done based on size of file or time.
The archivedLogFilenamePattern
property controls the rolling behaviour.
The rolling policy is determined from the filename pattern, e.g a pattern with a minute precision date format will be rolled every minute.
The following is an example of an appender that rolls based on the size of the log file:
- type: file
currentLogFilename: logs/app.log
# The name pattern, where i a sequential number indicating age, where 1 is the most recent
archivedLogFilenamePattern: logs/app-%i.log
# The maximum number of rolled files to keep
archivedFileCount: 10
# The maximum size of a log file
maxFileSize: "100MB"
logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
The following is an example of an appender that rolls every minute to gzipped files:
- type: file
currentLogFilename: logs/app/app.log
# Rolled and gzipped every minute
archivedLogFilenamePattern: logs/app/app-%d{yyyy-MM-dd'T'HH:mm}.log.gz
# One week using minute files
archivedFileCount: 10080
logFormat: "%-6level [%d{\"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'\",UTC}] [%t] %logger - %X{code} %msg %n"
Warning
Log file rolling is event based, so a file will only roll when a new message arrives that would require a roll to happen.
This means that if the application is idle for a long period with no log output then the un-rolled file will remain active until a new message arrives to trigger it to roll. For example, if Stroom is unused overnight, then the last log message from the night before will not be rolled until a new messages arrive in the morning.
For this reason, archivedFileCount
should be set to a value that is greater than the maximum time the application may be idle, else rolled log files may be deleted as soon as they are rolled.
2 - Stroom Configuration
Describes how the Stroom application is configured.
General configuration
The Stroom application is essentially just an executable
JAR
file that can be run when provided with a configuration file, config.yml
.
This config file is common to all forms of deployment.
config.yml
Stroom operates on a configuration by exception basis so all configuration properties will have a sensible default value and a property only needs to be explicitly configured if the default value is not appropriate, e.g. for tuning a large scale production deployment or where values are environment specific.
As a result config.yml
only contains a minimal set of properties.
The full tree of properties can be seen in ./config/config-defaults.yml
and a schema for the configuration tree (along with descriptions for each property) can be found in ./config/config-schema.yml
.
These two files can be used as a reference when configuring stroom.
Key Configuration Properties
The following are key properties that would typically be changed for a production deployment.
All configuration branches are relative to the appConfig
root.
The database name(s), hostname(s), port(s), usernames(s) and password(s) should be configured using these properties.
Typically stroom is configured to keep it statistics data in a separate database to the main stroom database, as is configured below.
commonDbDetails:
connection:
jdbcDriverUrl: "jdbc:mysql://localhost:3307/stroom?useUnicode=yes&characterEncoding=UTF-8"
jdbcDriverUsername: "stroomuser"
jdbcDriverPassword: "stroompassword1"
statistics:
sql:
db:
connection:
jdbcDriverUrl: "jdbc:mysql://localhost:3307/stats?useUnicode=yes&characterEncoding=UTF-8"
jdbcDriverUsername: "statsuser"
jdbcDriverPassword: "stroompassword1"
In a clustered deployment each node must be given a node name that is unique within the cluster.
This is used to identify nodes in the Nodes screen.
It could be the hostname of the node or follow some other naming convetion.
node:
name: "node1a"
Each node should have its identity on the network configured so that it uses the appropriate FQDNs.
The nodeUri
hostname is the FQDN of each node and used by nodes to communicate with each other, therefore it can be private to the cluster of nodes.
The publicUri
hostname is the public facing FQDN for stroom, i.e. the address of a load balancer or Nginx.
This is the address that users will use in their browser.
nodeUri:
hostname: "localhost" # e.g. node5.stroomnodes.somedomain
publicUri:
hostname: "localhost" # e.g. stroom.somedomain
Deploying without Docker
Stroom running without docker has two files to configure it.
The following locations are relative to the stroom home directory, i.e. the root of the distribution zip.
./config/config.yml
- Stroom configuration YAML file
./config/scripts.env
- Stroom scripts configuration env file
The distribution also includes these files which are helpful when it comes to configuring stroom.
./config/config-defaults.yml
- Full version of the config.yml file containing all branches/leaves with default values set.
Useful as a reference for the structure and the default values.
./config/config-schema.yml
- The schema defining the structure of the config.yml
file.
scripts.env
This file is used by the various shell scripts like start.sh
, stop.sh
, etc.
This file should not need to be unless you want to change the locations where certain log files are written to or need to change the java memory settings.
In a production system it is highly likely that you will need to increase the java heap size as the default is only 2G.
The heap size settings and any other java command line options can be set by changing:
JAVA_OPTS="-Xms512m -Xmx2048m"
As part of a docker stack
When stroom is run as part of one of our docker stacks, e.g. stroom_core there are some additional layers of configuration to take into account, but the configuration is still primarily done using the config.yml
file.
Stroom’s config.yml
file is found in the stack in ./volumes/stroom/config/
and this is the primary means of configuring Stroom.
The stack also ships with a default config.yml
file baked into the docker image.
This minimal fallback file (located in /stroom/config-fallback/
inside the container) will be used in the absence of one provided in the docker stack configuration (./volumes/stroom/config/
).
The default config.yml
file uses environment variable substitution so some configuration items will be set by environment variables set into the container by the stack env file and the docker-compose YAML.
This approach is useful for configuration values that need to be used by multiple containers, e.g. the public FQDN of Nginx, so it can be configured in one place.
If you need to further customise the stroom configuration then it is recommended to edit the ./volumes/stroom/config/config.yml
file.
This can either be a simple file with hard coded values or one that uses environment variables for some of its
configuration items.
The configuration works as follows:
env file (stroom<stack name>.env)
|
|
| environment variable substitution
|
v
docker compose YAML (01_stroom.yml)
|
|
| environment variable substitution
|
v
Stroom configuration file (config.yml)
Ansible
If you are using Ansible to deploy a stack then it is recommended that all of stroom’s configuration properties are set directly in the config.yml
file using a templated version of the file and to NOT use any environment variable substitution.
When using Ansible, the Ansible inventory is the single source of truth for your configuration so not using environment variable substitution for stroom simplifies the configuration and makes it clearer when looking at deployed configuration files.
Stroom-ansible has an example inventory for a single node stroom stack deployment.
The group_vars/all file shows how values can be set into the env file.
3 - Stroom Proxy Configuration
Describes how the Stroom-Proxy application is configured.
The configuration of Stroom-proxy is very much the same as for Stroom with the only difference being the structure of the application specific part of the config.yml
file.
Stroom-proxy has a proxyConfig
key in the YAML while Stroom has appConfig
.
General configuration
The Stroom-proxy application is essentially just an executable
JAR
file that can be run when provided with a configuration file, config.yml
.
This configuration file is common to all forms of deployment.
config.yml
Stroom-proxy does not have a user interface so the config.yml
file is the only way of configuring stroom-proxy.
As with stroom, the config.yml
file is split into three sections using these keys:
server
- Configuration of the web server, e.g. ports, paths, request logging.
logging
- Configuration of application logging
proxyConfig
- Stroom-Proxy specific configuration
See also Properties for more details on structure of the config.yml file and supported data types.
Stroom-Proxy operates on a configuration by exception basis so all configuration properties will have a sensible default value and a property only needs to be explicitly configured if the default value is not appropriate, e.g. for tuning a large scale production deployment or where values are environment specific.
As a result config.yml
only contains a minimal set of properties.
The full tree of properties can be seen in ./config/config-defaults.yml
and a schema for the configuration tree (along with descriptions for each property) can be found in ./config/config-schema.yml
.
These two files can be used as a reference when configuring stroom.
Key Configuration Properties
Stroom-proxy has two main functions, storing and forwarding.
It can be configured to do either or both of these functions.
These functions are enabled/disabled using:
proxyConfig:
# The list of named destinations that Stroom-Proxy will forward to
forwardHttpDestinations:
- enabled: true
name: "downstream"
forwardUrl: "https://some-host/stroom/datafeed"
# Whether to store received data in a repository
repository:
storingEnabled: true
# If we are storing data in a proxy repository we can aggregate it before forwarding.
aggregator:
maxItemsPerAggregate: 1000
maxUncompressedByteSize: "1G"
maxAggregateAge: 10m
aggregationFrequency: 1m
Stroom-proxy should be configured to check the receipt status of feeds on receipt of data.
This is done by configuring the end point of a downstream stroom-proxy or stroom.
feedStatus:
url: "http://stroom:8080/api/feedStatus/v1"
apiKey: ""
The url
should be the url for the feed status API on the downstream stroom(-proxy).
If this is on the same host then you can use the http endpoint, however if it is on a remote host then you should use https and the host of its nginx, e.g. https://downstream-instance/api/feedStatus/v1
.
In order to use the API, the proxy must have a configured apiKey
.
The API key must be created in the downstream stroom instance and then copied into this configuration.
If the proxy is configured to forward data then the forward destination(s) should be set.
This is the datafeed
endpoint of the downstream stroom-proxy or stroom instance that data will be forwarded to.
This may also be te address of a load balancer or similar that is fronting a cluster of stroom-proxy or stroom instances.
See also Feed status certificate configuration.
forwardHttpDestinations:
- enabled: true
name: "downstream"
forwardUrl: "https://some-host/stroom/datafeed"
forwardUrl
specifies the URL of the datafeed endpoint on the destination host.
Each forward location can use a different key/trust store pair.
See also Forwarding certificate configuration.
If the proxy is configured to store then it is the location of the proxy repository may need to be configured if it needs to be in a different location to the proxy home directory, e.g. on another mount point.
Deploying without Docker
Apart from the structure of the config.yml
file, the configuration in a non-docker environment is the same as for stroom
As part of a docker stack
The way stroom-proxy is configured is essentially the same as for stroom with the only real difference being the structure of the config.yml
file as note above .
As with stroom the docker stack comes with a ./volumes/stroom-proxy-*/config/config.yml
file that will be used in the absence of a provided one.
Also as with stroom, the config.yml
file supports environment variable substitution so can make use of environment variables set in the stack env file and passed down via the docker-compose YAML files.
Certificates
Stroom-proxy makes use of client certificates for two purposes:
- Communicating with a downstream stroom/stroom-proxy in order to establish the receipt status for the feeds it has received data for.
- When forwarding data to a downstream stroom/stroom-proxy
The stack comes with the following files that can be used for demo/test purposes.
volumes/stroom-proxy-*/certs/ca.jks
volumes/stroom-proxy-*/certs/client.jks
For a production deployment these will need to be changed, see Certificates
Feed status certificate configuration
The configuration of the client certificates for feed status checks is done using the FEED_STATUS
jersey client configuration.
See Stroom and Stroom-Proxy Common Configuration.
Forwarding certificate configuration
Stroom-proxy can forward to multiple locations.
The configuration of the certificate(s) for the forwarding locations is as follows:
proxyConfig:
forwardHttpDestinations:
- enabled: true
name: "downstream"
forwardUrl: "https://some-host/stroom/datafeed"
sslConfig:
keyStorePath: "/stroom-proxy/certs/client.jks"
keyStorePassword: "password"
keyStoreType: "JKS"
trustStorePath: "/stroom-proxy/certs/ca.jks"
trustStorePassword: "password"
trustStoreType: "JKS"
hostnameVerificationEnabled: true
forwardUrl
specifies the URL of the datafeed endpoint on the destination host.
Each forward location can use a different key/trust store pair.