This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Developer Guide

This section is intended for developers contributing to the development of the Stroom software (and its related applications). It describes how to do some common tasks, such as releasing and building documentation. It also covers the development of any supporting scripts or utilities for running/administering Stroom, e.g. Ansible playbooks, Helm charts, etc.

1: Software Stack
2: Running Stroom in an IDE
3: Components
4: Contributing
5: Release Process

5.1: Releasing Stroom

6: Setting up releases to Sonatype & Maven Central

1 - Software Stack

The software stack used by the Stroom family of products.

Stroom and Stroom Proxy

Stroom and Stroom Proxy live in the same repository, share some common code and are built by the same Gradle build.

Languages and key frameworks

Java 15 - The language for the core application
- Dropwizard - A RESTful framework incorporating embedded Jetty.
- Junit 5
- SLF4j and Logback
- Mockito
- Jooq - Generates Java code for type safe SQL.
- Apache Lucene - The search library used by Stroom’s indexes and dashboard queries.
- Lightning Memory Mapped Database - Used for memory mapped persistent reference and search data stores.
React - Some of the new UI screens.
- Typescript

Build and development tools

Gradle - Building the java application and orcestrating related sub-builds, e.g. npm.
Github Actions - The CI build and release.
Bash - Various utility shell scripts.
Docker - Building the stroom and stroom-proxy docker images.
Docker Compose -
Docker containers - Provide consistent build environments for
- Java
- npm
- Plant UML
npm - For the build of the new React based UI screens.

Services

Nginx - Used for SSL termination, load balancing and reverse proxying.
MySQL - Database for persistence in Stroom.

2 - Running Stroom in an IDE

How to run Stroom in an Integrated Development Environment, e.g. IntelliJ

We tend to use IntelliJ as our Java IDE of choice. This is a guide for running Stroom in IntelliJ for the purposes of developing/debugging Stroom.

Prerequisites

In order to build/run/debug Stroom you will need the following:

OpenJDK 15
Git
Gradle
IntelliJ
Docker CE
Docker Compose

These instructions assume that all servcies will either run in the IDE or in Docker containers.

We develop on Linux so if you are running on a Mac you may experience issues with some of our shell scripts. For running the various shell scripts in our repositories you are advised to install

bash 4+
jq
GNU grep
GNU sed

Stroom git repositories

To develop Stroom you will need to clone/fork multiple git repositories. To quickly clone all of the Stroom repositories you can use the helper script described in stroom-resource .

Database setup

Stroom requires a MySQL database to run. You can either point stroom at a local MySQL server or use the MySQL Docker container from stroom-resources.

MySQL in a Docker container

See the section below on stroom-resources.

Host based MySQL server

With an instance of MySQL server 8.0 running on your local machine do the following to create the stroom database:

# log into your MySQL server using your root credentials
mysql --user=root --password=myrootpassword

Then run the following commands in the MySQL shell:

drop database stroom;
create database stroom;
grant all privileges on stroom.* to stroomuser@localhost identified by 'stroompassword1';
quit;

Local configuration file

When running stroom in an IDE you need to have a local configuration file to allow you to change settings locally without affecting the repository. The local configuration file live in the root of the Stroom repository ./local.yml.

To create a default version of this file run this script from within the root of the stroom git repository.

./local.yml.sh

This will create ./local.yml using stroom-app/dev.yml as a template. So that you can run a multi-node cluster it will also create ./local2.yml and ./local3.yml as well. These files are not source controlled so you can make any changes you like to them, e.g. setting log levels or altering stroom property property values values.

stroom-resources

As a minimum to develop stroom you will need clones of the stroom and stroom-resources git repositories. stroom-resources provides the docker-compose configuration for running the many docker containers needed.

Having cloned stroom-resources navigate to the directory stroom-resources/bin and run the script

./bounceIt.sh -y

On first run this will create a default version of the git-ignored file stroom-resources/bin/local.env which is intended for use by developers to configure the docker stacks to run.

This file is used to set a number of environment variables that docker compose will use to configure the various containers. The key variable in there is SERVICE_LIST. This is a bash array that sets the services to run. By default it is set to run stroom-all-dbs (MySQL + database init scripts) and nginx which are sufficient for running Stroom in an IDE.

Verify the Gradle build

Before trying to run Stroom in an IDE it is worth performing a Gradle build to verify the code compiles and all dependencies are present. This command will run all parts of the build except for the tests which can take 20+mins to run. Some parts of the build are run inside docker containers (to remove the need to install additional dependencies) so on first run there will be an overhead of building the docker image layers. These layers will be cached which will speed up future builds.

./gradlew clean build -x test

Local or embedded MySQL

The Junit integration tests that need a database can either be run against the local MySQL (i.e. stroom-all-dbs) or an embedded MySQL instance.

Configuring the database used can be done with the JVM argument -DuseEmbeddedMySql=false, which can be set in Run/Debug Configurations => Edit configuration templates… => JUnit => VM options in Intellij. False will use your local MySQL instance, true with use the embedded one. The CI build uses the embedded MySQL.

The pros/cons of using the embedded instance are:

Pros

No dependency on stroom-resources to run the full build.

Cons

Requires the MySQL binaries to be downloaded, sometimes multiple times.
Consumes a lot of disk space if multiple instances are run.
Harder to debug tests as the database is destroyed at the end of the test.

Clearing down your environment

If you need to work from a clean slate and you are using the container based MySQL you can run the following:

Warning

This script will delete ALL containers running/stopped whether related to Stroom or not. It is essentially a clean slate for your docker environment. If you are running other unrelated containers, don’t run this.

It will also delete all stroom state held on the filesystem, i.e. the stream store and lucene index shards.

pwd
(out)/home/dev/git_work/stroom-resources/bin
(out)
./clean.sh \
&& rm -rf ~/tmp/stroom \
&& rm -rf /tmp/stroom \
&& rm -rf ~/.stroom/volumes \
&& rm -rf ~/.stroom/temp \
&& rm -rf ~/.stroom/logs \
&& rm -rf ~/.stroom/v7

Sample Data

When developing Stroom it is helpful to have Stroom run with pre-loaded content and data as by default it will be completely empty. SetupSampleData.java is a class that loads pre-defined content and data into the database and file system so that Stroom can begin processing data on boot. This sample data/content is very useful for manually testing and exercising the application in development. This class assumes that the database being used for Stroom is completely empty.

To run SetupSampleData use the pre-defined Run Configuration in IntelliJ called SetupSampleData. This will load content (e.g. XSLTs, Pipelines, etc.), create Feeds and load data into the Feeds.

You should now have a database and stream store populated with tables and data, providing you with some predefined feeds, data, translations, pipelines, dashboards, etc.

When Stroom is next started it will begin to process the data using the pre-defined pipelines.

Running Stroom from the IDE

The user interface for Stroom is built using GWT (see GWT Project for more information or GWT specific documentation). As a result Stroom needs to be started up with GWT Super Dev Mode. Super Dev Mode handles the on-the-fly compilation of the Java user interface source into JavaScript and the source map that links client JavaScript back to Java source for client side debugging.

The following steps for running and debugging Stroom in IDEA assume you have a MySQL database running on localhost:3307, with a database stroom and user stroomuser already created.

JAVA_HOME

Ensure environment variable JAVA_HOME is set and points to a valid JDK 15 directory

export JAVA_HOME=~/.jdks/openjdk-15.0.2

Alternatively to simplify the process of installing and managing Java JDKs consider using SDKMan .

Build `stroom-app`

NOTE: During development, it is helpful to skip running unit and integration tests, to speed up the build process:

./gradlew clean build -x test

Start a single Stroom node

Select the IDEA run configuration named Stroom GWT SuperDevMode
Click Debug. Stroom will start, with log output displayed in the Run pane at the bottom of the window.

This run configuration essentially sets the JVM argument -DgwtSuperDevMode=true to run the application in Super Dev Mode.

Watch the log output. Once you see a log INFO message containing the text “Started”, you will be able to launch the app in a browser from: https://localhost.

You will see the Stroom blue background, with a username/password prompt. Enter the following default credentials:

Username: admin
Password: admin

You can now interact with Stroom and set breakpoints in Java code. Note that setting breakpoints in any of the java code in modules suffixed with -client (i.e. client side GWT Java code) does not have any effect, as these components are compiled to static JavaScript. Breakpoints in modules ending -shared will only have an effect if you are debugging server side code.

Note

Stroom has been written with Google’s Chrome browser in mind so has only been tested on Chrome. Behaviour in other browsers may vary. We would like to improve cross-browser support so please let us know about any browser incompatibilities that you find.

Starting the Super Dev Mode Compiler

With the Stroom application running you need to also run a draft GWT compile and run the Super Dev Mode compiler.

On first use it is recomended to run:

./gradlew gwtClean :stroom-app-gwt:gwtDraftCompile :stroom-app-gwt:gwtSuperDevMode

This will ensure a clean state of the GWT compiled javascript. It may be necessary to re-run the clean, and draft compile if there have been significant changes to the Java code or if there are problems running Stroom in Super Dev Mode.

Normally however you can just run:

./gradlew :stroom-app-gwt:gwtSuperDevMode

When this gradle task runs it will echo some instructions for how to set up your browser. Once the browser is all set up with the dev mode favorites you can visit Stroom at

http://localhost:8080 (bypassing Nginx)
https//localhost (via Nginx)

Running without Nginx is simpler but can hide problems with the Stroom/Nginx configuration/integration.

Authentication

In development you can either run Stroom with authentication on or off. It is a quicker development experience with authentication turned off but this can hide any problems with authentication flow.

To run Stroom with authentication turned off set the following in local.yml:

stroom:
  security:
    authentication:
      authenticationRequired: false

If you want to run with authentication but don’t want to be prompted to change the password on first boot you can set:

stroom:
  security:
    identity:
      passwordPolicy:
        forcePasswordChangeOnFirstLogin: false

Alternatively you can run the IntelliJ Run Configuration Stroom Reset Admin Password, which will reset the password to admin and prevent further prompts to change it.

Right click behaviour

Stroom overrides the default right click behaviour in the browser with its own context menu. For UI development it is often required to have access to the browser’s context menu for example to inspect elements. To enable the browser’s context menu you need to ensure this is property is set to null in dev.yml:

stroom:
  ui:
    oncontextmenu: null

To return it to its defualt value, set it to "return false;".

Hot loading GWT UI code changes

If you make any changes to the Java code in -client or -shared modules then in order for them to be hot loaded into the Javascript code you simply need to refresh the brower. This will trigger Super Dev Mode to recompile any changed code.

If you have make significant code changes, e.g. moving/renaming classes then GWT can get confused so you may need to run the gwtDraftCompile and/or gwtClean gradle tasks followed by gwtSuperDevMode.

Debugging GWT UI code

To debug the GWT UI code you will need to use Chrome Dev Tools (shift+ctrl+i). Setting breakpoints in the UI code in IntelliJ will have no effect. SuperDevMode creates source maps that link the running javascript back to Java code that you can set break points in.

To find the Java source in Chrome Dev Tools open the Sources tab then in the left hand navigator pane (Page tab) select:

Top => ui => stroom (ui) => 127.0.0.1:9876 => sourcemaps/stroom => stroom

This folder then contains all the stroom java packages.

3 - Components

Stroom is broken down into separate components. Each component encapsulates a specific area of Stroom functionality and aids development by providing a single area of focus for new features and ensures separate components remain as loosely coupled as possible. Components can be tested in isolation and their interaction with other components easily understood by only allowing dependencies via minimal APIs.

Some examples of components in Stroom include

stroom-activity - Component for recording a users actions against a current activity
stroom-dictionary - Component for storing lists of words.
stroom-statistics - Component for recording statistical data, e.g. amount of data received in X minutes.

In the project structure a component appears as a first level subdirectory of the root project folder. Components have further subdirectories (modules) that make up the various parts of the component, e.g.

stroom - Root project
- stroom-activity - The component
  - stroom-activity-api - API module for stroom-activity
  - stroom-activity-impl - Implementation of the API and other module implementation code
  - stroom-activity-impl-db - Database persistence implementation used by impl
  - stroom-activity-impl-db-jooq - JOOQ generated classes used by stroom-activity-impl-db
  - stroom-activity-mock - Mock implementation for the stroom-activity API

Dependencies between a modules components

The diagram below shows the dependencies between the different modules that make up a component as well as the internal dependencies within the impl module. The actual implementations used at runtime are determined by Guice bindings in whichever Guice modules are loaded by the application. Tests can bind mock implementations of a components API just by using the Guice module within the mock module.

images/dev-guide/module-dependencies.puml.svg — Internal Component Dependencies

Dependencies between components

Typically a component will need to call out to other components to apply security constraints and to log user activity. These typical relationships are shown in the diagram below.

images/dev-guide/external-dependencies.puml.svg — External Component Dependencies

Component API, e.g. modules ending in `-api`

API layer

All communication between components in stroom must be made via a component’s API. The API provides the minimum surface area for communication between components and decouples dependencies between components to just the API code. For component testing purposes mock implementations of these APIs can be used to limit testing to just a single component.

Component API and service implementation, e.g. modules ending in `-impl`

Client interaction - REST services and GWT Action Handlers

The uppermost layer of the server side code services requests from the client. The client may make restful calls as is the case for the new UI or will use Actions that are handles with ActionHandlers as is the case for the legacy GWT UI.

Since this layer deals with all client interaction it should be responsible for creating audit logs for all user activity, e.g. accessing documents, searching etc. No audit logging should need to be performed at a lower level within the application as deeper levels have less knowledge of user intent since they may just be playing a part in the wider request.

The client interaction layer adds no logic and asks the underlying service layer to service the encapsulated request away from the REST endpoint wrapping code or GWT action handler code. This allows multiple types of endpoint to use the same underlying service layer. If a request requires the use of multiple services to form a response, this must be handled within the service layer by the primary service which will be responsible for any such orchestration.

Service layer

The service layer applies permission constraints to any requests being made so that only calls from identified and permitted users are allowed to proceed. The service layer performs all orchestration and business logic, and is responsible for all mutations of objects that will be persisted by the underlying persistence layer such as stamping objects to be updated with the current user and update time.

The service layer provides implementations for any API that the component may have.

The service layer provides the DAO (Data Access Object) API for the persistence layer to implement but maintains no knowledge of underlying persistence implementation, e.g. database queries.

Persistence implementation, e.g. modules ending in `-impl-db`

Persistence layer - DAOs

The persistence layer is an implementation of one or more DAOs specified in the service layer. The persistence layer provides no logic, it just stores and retrieves objects in a database or other persistence technology. If serialisation/de-serialisation is required in order to persist the object then that should also be performed by this layer so that no code above this layer has to care about this implementation detail.

The persistence layer does not apply security or permissions checking so should not need to reference the security API.

4 - Contributing

How to make contributions to the Stroom GitHub repositories.

TODO

Feature branches
- Branch name
Change log
Checkstyle
PR conventions
- Link to issue

5 - Release Process

How to release new versions of the software.

5.1 - Releasing Stroom

How to release a new version of Stroom

Pre-requisites for a release

The follow need to be completed before a release is made.

Logging changes

Stroom and its related repositories all have a CHANGELOG.md file for recording changes made between releases. Before making a release you should ensure that all changes have been recorded in the CHANGELOG. This is not done by directly editing the file but instead using the script ./log_change.sh.

log_change creates change entry files in the directory ./unreleased_changes/. This prevents merge conflicts that would happen with multiple people editing the CHANGELOG file.

The following examples show you how to use the log_change script.

# Log a change for the issue number in your current branch (e.g. branch: gh-1234-fix-dead-locks)
./log_change auto "Fix database dead locks during purge job"
(out)
# Log a change for the issue number in your current branch (e.g. branch: gh-1234-fix-dead-locks)
# Your default editor will open the created skeleton change file
./log_change auto
(out)
# Log a change with no associated issue
./log_change 0 "Fix typo on about screen"
(out)
# Log a change for issue #1234
./log_change 1234 "Fix database dead locks during purge job"
(out)
# Log a change for issue #1234
# Your default editor will open the created skeleton change file
./log_change 1234
(out)
# Log a change for an issue in a different repository
./log_change gchq/stroom#2424 "Fix database dead locks during purge job"
(out)
# Log a change for an issue in a different repository
# Your default editor will open the created skeleton change file
./log_change gchq/stroom#2424
(out)
# List all unreleased changes
./log_change list

Commit and push all changes

Before releasing all local changes that you want in the release should be committed and pushed. Commits that you want in a release should be merged down to a release branch, e.g. 7.0 or master. Once pushed and merged ensure that the branch passes the CI build .

Decide on the next version number

Stroom versioning follows Semantic Versioning .

Given a version number MAJOR.MINOR.PATCH:

MAJOR is incremented when there are major or breaking changes.
MINOR is incremented when functionality is added in a backwards compatible manner.
PATCH is incremented when bugs are fixed.

Based on the changes since the last release establish if it is a major, minor or patch release to determine the next version number.

Performing a named release of Stroom

Once all the above pre-requisites have been met you can trigger the release by running this command:

./tag_release.sh

This script will do the following:

Adds the content of the unreleased change entry files (created by log_change.sh) to the CHANGELOG.
Prompts for (and suggests) the next version based on the previous release.
Adds a new version heading to CHANGELOG.
Adds/updates the version compare links in the CHANGELOG.
Commits and pushes the change log changes.
Creates an annotated git tag using the release version number and change entries.

The tagged git commit will trigger a CI build that includes additional release elements such as:

Pushing the built docker images to DockerHub.
Creating a release for the git tag in GitHub with all the release artefacts.
Publishing any libraries to Sonatype and Maven Central.

Performing a named release of the docker stacks

Once the Stroom release build has finished and the artefacts are available on GitHub Releases and DockerHub you can create an associated release of the Stroom docker stacks.

In the following examples we will assume that you have just released Stroom v7.0.1 on branch 7.0, and the previous release was v7.0.0.

Checkout and pull the corresponding release branch in the stroom-resources repository.

cd stroom-resources
(out)
git checkout 7.0
(out)
git pull

Now edit the file bin/stack/container_versions.env and edit the following line, setting it to the version of stroom you have just released:

Before

  STROOM_TAG="v7.0.0"

After

  STROOM_TAG="v7.0.1"

If any of the other docker image versions need updating then do it at this point.

Now add/commit/push the change. Check the CI build is successful for the new Stroom image.

If the build is green then tag the stacks release as follows:

pwd
(out)/home/dev/git_work/stroom-resources
(out)
# Clear out any previous build
rm -rf ./bin/stack/build
(out)
./tag_release-stroom-stacks.sh stroom-stacks-v7.0.1

This script will build all the stack variants locally to ensure they will build successfully, though it does not test them. If the local build is successful it will then create an annotated git tag which will trigger a release CI build. The release CI build will create an archive for each stack variant and add them as a release artefacts.

SNAPSHOT releases

SNAPSHOT releases should not be released to Sonatype or Maven Central. If a development version of a library needs to be shared between projects then you can either use the Gradle task publishToMavenLocal to publish a SNAPSHOT version to your local Maven repository and change your dependency version to SNAPSHOT, or perform a named release along the lines of vx.y.z-alpha.n.

Release Versioning conventions

Semantic versioning is used, and this should be adhered to, see SemVer . The following are examples of valid version names

SNAPSHOT - Used only for local development, never to be published publicly.
v3.3.0 - Initial release of v3.3, with an associated 3.3 branch.
v3.3.1 - A patch release to v3.3 on the 3.3 branch.
v3.4.0-alpha.1 - An alpha release of v3.4, either on master or a 3.4 branch
v3.4.0-beta.1 - An beta release of v3.4, either on master or a 3.4 branch

To Perform a Local Build

# Full build:
./gradlew clean build
(out)
# Build without unit tests
./gradlew clean build -x test
(out)
# Build without any tests or GWT compilation (GWT compilation applies to stroom only)
./gradlew clean build -x test -x gwtCompile

6 - Setting up releases to Sonatype & Maven Central

This is a rough guide to what was done to set it up. Some bits may be missing.

Create a Sonatype account

You need to create an account on Sonatype and you will need to raise a jira ticket on Sonatype’s jira to get approved on the uk.gov.gchq group. This will require an existing user approved for the group to approve you on the ticket.

Setting up a GPG key

You can use the following commands for setting up a GPG2 key for signing.

# Generate the GPG2 key
gpg2 --gen-key

# To list all keys
gpg2 --list-keys

# To get the key ID
gpg2  --list-secret-keys | grep "\[SC\]" | tr -s ' ' | cut -d' ' -f2 | cut -d'/' -f2

# To send the public keys to a key server
gpg2 --keyserver hkp://pool.sks-keyservers.net --send-keys <key id>
gpg2 --keyserver hkp://keyserver.ubuntu.com --send-keys <key id>
gpg2 --keyserver hkp://pgp.mit.edu --send-keys <key id>

# To display the secret key in base64 form, for use in GH actions
key="$(gpg2 --armor --export-secret-keys <key id> | base64 -w0)"; \
echo -e "-------\n$key\n-------"; \
key=""

Setting up the gradle build

The signing and release to Sonatype is done by various gradle plugins.

id "io.github.gradle-nexus.publish-plugin" version "1.0.0"
id "signing"
id "maven-publish"

See the root and event-logging-api gradle build files (in the event-logging repo) for an example of how to set up gradle.

The credentials can be passed to the gradle build using special gradle env vars Project Properties (external). The credentials required are:

ORG_GRADLE_PROJECT_SIGNINGKEY - The key as produced by the gpg2 --armor command.
ORG_GRADLE_PROJECT_SIGNINGPASSWORD - The password for the GPG key.
ORG_GRADLE_PROJECT_SONATYPEUSERNAME - The account username on Sonatype.
ORG_GRADLE_PROJECT_SONATYPEPASSWORD - The account password on Sonatype.

Setting up Github Actions

You will need to provide Github with the four secrets listed above by setting them as repository secrets at https://github.com/gchq//settings/secrets/actions. For each one create a secret with the ORG_GRADLE_... bit as the name.

So that the action can create the Github release you will also need to set up an SSH key pair and provide it with the public and private key. To generate the key pair do:

ssh-keygen -t rsa -b 4096 -f <repo>_deploy_key

The key pair will be created in ~/.ssh/.

Create a repo deploy key with the public key, named ‘Actions Deploy Key’ and with write access at https://github.com///settings/keys/new. Create a repo secret with the private key, named ‘SSH_DEPLOY_KEY’ at https://github.com///settings/secrets/actions/new.