This is the multi-page printable view of this section. Click here to print.
Developer Guide
- 1: Software Stack
- 2: Running Stroom in an IDE
- 3: Components
- 4: Contributing
- 5: Release Process
- 5.1: Releasing Stroom
- 6: Setting up releases to Sonatype & Maven Central
1 - Software Stack
Stroom and Stroom Proxy
Stroom and Stroom Proxy live in the same repository, share some common code and are built by the same Gradle build.
Languages and key frameworks
- Java 15 - The language for the core application
- Dropwizard - A RESTful framework incorporating embedded Jetty.
- Junit 5
- SLF4j and Logback
- Mockito
- Jooq - Generates Java code for type safe SQL.
- Apache Lucene - The search library used by Stroom’s indexes and dashboard queries.
- Lightning Memory Mapped Database - Used for memory mapped persistent reference and search data stores.
- React - Some of the new UI screens.
- Typescript
Build and development tools
- Gradle - Building the java application and orcestrating related sub-builds, e.g. npm.
- Github Actions - The CI build and release.
- Bash - Various utility shell scripts.
- Docker - Building the stroom and stroom-proxy docker images.
- Docker Compose -
- Docker containers - Provide consistent build environments for
- Java
- npm
- Plant UML
- npm - For the build of the new React based UI screens.
Services
- Nginx - Used for SSL termination, load balancing and reverse proxying.
- MySQL - Database for persistence in Stroom.
2 - Running Stroom in an IDE
We tend to use IntelliJ as our Java IDE of choice. This is a guide for running Stroom in IntelliJ for the purposes of developing/debugging Stroom.
Prerequisites
In order to build/run/debug Stroom you will need the following:
- OpenJDK 15
- Git
- Gradle
- IntelliJ
- Docker CE
- Docker Compose
These instructions assume that all servcies will either run in the IDE or in Docker containers.
We develop on Linux so if you are running on a Mac you may experience issues with some of our shell scripts. For running the various shell scripts in our repositories you are advised to install
- bash 4+
- jq
- GNU grep
- GNU sed
Stroom git repositories
To develop Stroom you will need to clone/fork multiple git repositories. To quickly clone all of the Stroom repositories you can use the helper script described in stroom-resource .
Database setup
Stroom requires a MySQL database to run. You can either point stroom at a local MySQL server or use the MySQL Docker container from stroom-resources.
MySQL in a Docker container
See the section below on stroom-resources.
Host based MySQL server
With an instance of MySQL server 8.0 running on your local machine do the following to create the stroom database:
Then run the following commands in the MySQL shell:
Local configuration file
When running stroom in an IDE you need to have a local configuration file to allow you to change settings locally without affecting the repository.
The local configuration file live in the root of the Stroom repository ./local.yml
.
To create a default version of this file run this script from within the root of the stroom git repository.
This will create ./local.yml
using stroom-app/dev.yml
as a template.
So that you can run a multi-node cluster it will also create ./local2.yml
and ./local3.yml
as well.
These files are not source controlled so you can make any changes you like to them, e.g. setting log levels or altering stroom property property values values.
stroom-resources
As a minimum to develop stroom you will need clones of the stroom
and stroom-resources
git repositories.
stroom-resources
provides the docker-compose configuration for running the many docker containers needed.
Having cloned stroom-resources
navigate to the directory stroom-resources/bin
and run the script
On first run this will create a default version of the git-ignored file stroom-resources/bin/local.env
which is intended for use by developers to configure the docker stacks to run.
This file is used to set a number of environment variables that docker compose will use to configure the various containers.
The key variable in there is SERVICE_LIST
.
This is a bash array that sets the services to run.
By default it is set to run stroom-all-dbs
(MySQL + database init scripts) and nginx
which are sufficient for running Stroom in an IDE.
Verify the Gradle build
Before trying to run Stroom in an IDE it is worth performing a Gradle build to verify the code compiles and all dependencies are present. This command will run all parts of the build except for the tests which can take 20+mins to run. Some parts of the build are run inside docker containers (to remove the need to install additional dependencies) so on first run there will be an overhead of building the docker image layers. These layers will be cached which will speed up future builds.
Local or embedded MySQL
The Junit integration tests that need a database can either be run against the local MySQL (i.e. stroom-all-dbs
) or an embedded MySQL instance.
Configuring the database used can be done with the JVM argument -DuseEmbeddedMySql=false
, which can be set in Run/Debug Configurations => Edit configuration templates… => JUnit => VM options in Intellij.
False will use your local MySQL instance, true with use the embedded one.
The CI build uses the embedded MySQL.
The pros/cons of using the embedded instance are:
- No dependency on stroom-resources to run the full build.
- Requires the MySQL binaries to be downloaded, sometimes multiple times.
- Consumes a lot of disk space if multiple instances are run.
- Harder to debug tests as the database is destroyed at the end of the test.
Clearing down your environment
If you need to work from a clean slate and you are using the container based MySQL you can run the following:
Warning
This script will delete ALL containers running/stopped whether related to Stroom or not. It is essentially a clean slate for your docker environment. If you are running other unrelated containers, don’t run this.
It will also delete all stroom state held on the filesystem, i.e. the stream store and lucene index shards.
Sample Data
When developing Stroom it is helpful to have Stroom run with pre-loaded content and data as by default it will be completely empty.
SetupSampleData.java
is a class that loads pre-defined content and data into the database and file system so that Stroom can begin processing data on boot.
This sample data/content is very useful for manually testing and exercising the application in development.
This class assumes that the database being used for Stroom is completely empty.
To run SetupSampleData use the pre-defined Run Configuration in IntelliJ called SetupSampleData. This will load content (e.g. XSLTs, Pipelines, etc.), create Feeds and load data into the Feeds.
You should now have a database and stream store populated with tables and data, providing you with some predefined feeds, data, translations, pipelines, dashboards, etc.
When Stroom is next started it will begin to process the data using the pre-defined pipelines.
Running Stroom from the IDE
The user interface for Stroom is built using GWT (see GWT Project for more information or GWT specific documentation). As a result Stroom needs to be started up with GWT Super Dev Mode. Super Dev Mode handles the on-the-fly compilation of the Java user interface source into JavaScript and the source map that links client JavaScript back to Java source for client side debugging.
The following steps for running and debugging Stroom in IDEA assume you have a MySQL database running on localhost:3307
, with a database stroom
and user stroomuser
already created.
JAVA_HOME
Ensure environment variable JAVA_HOME
is set and points to a valid JDK 15 directory
Alternatively to simplify the process of installing and managing Java JDKs consider using SDKMan .
Build stroom-app
NOTE: During development, it is helpful to skip running unit and integration tests, to speed up the build process:
Start a single Stroom node
- Select the IDEA run configuration named
Stroom GWT SuperDevMode
- Click
Debug
. Stroom will start, with log output displayed in theRun
pane at the bottom of the window.
This run configuration essentially sets the JVM argument -DgwtSuperDevMode=true
to run the application in Super Dev Mode.
Watch the log output. Once you see a log INFO message containing the text “Started”, you will be able to launch the app in a browser from: https://localhost.
You will see the Stroom blue background, with a username/password prompt. Enter the following default credentials:
- Username:
admin
- Password:
admin
You can now interact with Stroom and set breakpoints in Java code.
Note that setting breakpoints in any of the java code in modules suffixed with -client
(i.e. client side GWT Java code) does not have any effect, as these components are compiled to static JavaScript.
Breakpoints in modules ending -shared
will only have an effect if you are debugging server side code.
Note
Stroom has been written with Google’s Chrome browser in mind so has only been tested on Chrome. Behaviour in other browsers may vary. We would like to improve cross-browser support so please let us know about any browser incompatibilities that you find.Starting the Super Dev Mode Compiler
With the Stroom application running you need to also run a draft GWT compile and run the Super Dev Mode compiler.
On first use it is recomended to run:
This will ensure a clean state of the GWT compiled javascript. It may be necessary to re-run the clean, and draft compile if there have been significant changes to the Java code or if there are problems running Stroom in Super Dev Mode.
Normally however you can just run:
When this gradle task runs it will echo some instructions for how to set up your browser. Once the browser is all set up with the dev mode favorites you can visit Stroom at
- http://localhost:8080 (bypassing Nginx)
- https//localhost (via Nginx)
Running without Nginx is simpler but can hide problems with the Stroom/Nginx configuration/integration.
Authentication
In development you can either run Stroom with authentication on or off. It is a quicker development experience with authentication turned off but this can hide any problems with authentication flow.
To run Stroom with authentication turned off set the following in local.yml
:
stroom:
security:
authentication:
authenticationRequired: false
If you want to run with authentication but don’t want to be prompted to change the password on first boot you can set:
stroom:
security:
identity:
passwordPolicy:
forcePasswordChangeOnFirstLogin: false
Alternatively you can run the IntelliJ Run Configuration Stroom Reset Admin Password, which will reset the password to admin
and prevent further prompts to change it.
Right click behaviour
Stroom overrides the default right click behaviour in the browser with its own context menu.
For UI development it is often required to have access to the browser’s context menu for example to inspect elements.
To enable the browser’s context menu you need to ensure this is property is set to null in dev.yml
:
stroom:
ui:
oncontextmenu: null
To return it to its defualt value, set it to "return false;"
.
Hot loading GWT UI code changes
If you make any changes to the Java code in -client
or -shared
modules then in order for them to be hot loaded into the Javascript code you simply need to refresh the brower.
This will trigger Super Dev Mode to recompile any changed code.
If you have make significant code changes, e.g. moving/renaming classes then GWT can get confused so you may need to run the gwtDraftCompile and/or gwtClean gradle tasks followed by gwtSuperDevMode.
Debugging GWT UI code
To debug the GWT UI code you will need to use Chrome Dev Tools (shift+ctrl+i
).
Setting breakpoints in the UI code in IntelliJ will have no effect.
SuperDevMode creates source maps that link the running javascript back to Java code that you can set break points in.
To find the Java source in Chrome Dev Tools open the Sources tab then in the left hand navigator pane (Page tab) select:
Top => ui => stroom (ui) => 127.0.0.1:9876 => sourcemaps/stroom => stroom
This folder then contains all the stroom java packages.
3 - Components
Some examples of components in Stroom include
- stroom-activity - Component for recording a users actions against a current activity
- stroom-dictionary - Component for storing lists of words.
- stroom-statistics - Component for recording statistical data, e.g. amount of data received in X minutes.
In the project structure a component appears as a first level subdirectory of the root project folder. Components have further subdirectories (modules) that make up the various parts of the component, e.g.
stroom
- Root projectstroom-activity
- The componentstroom-activity-api
- API module forstroom-activity
stroom-activity-impl
- Implementation of the API and other module implementation codestroom-activity-impl-db
- Database persistence implementation used by implstroom-activity-impl-db-jooq
- JOOQ generated classes used bystroom-activity-impl-db
stroom-activity-mock
- Mock implementation for thestroom-activity
API
Dependencies between a modules components
The diagram below shows the dependencies between the different modules that make up a component as well as the internal dependencies within the impl
module.
The actual implementations used at runtime are determined by Guice bindings in whichever Guice modules are loaded by the application.
Tests can bind mock implementations of a components API just by using the Guice module within the mock module.
Dependencies between components
Typically a component will need to call out to other components to apply security constraints and to log user activity. These typical relationships are shown in the diagram below.
Component API, e.g. modules ending in -api
API layer
All communication between components in stroom must be made via a component’s API. The API provides the minimum surface area for communication between components and decouples dependencies between components to just the API code. For component testing purposes mock implementations of these APIs can be used to limit testing to just a single component.
Component API and service implementation, e.g. modules ending in -impl
Client interaction - REST services and GWT Action Handlers
The uppermost layer of the server side code services requests from the client. The client may make restful calls as is the case for the new UI or will use Actions that are handles with ActionHandlers as is the case for the legacy GWT UI.
Since this layer deals with all client interaction it should be responsible for creating audit logs for all user activity, e.g. accessing documents, searching etc. No audit logging should need to be performed at a lower level within the application as deeper levels have less knowledge of user intent since they may just be playing a part in the wider request.
The client interaction layer adds no logic and asks the underlying service layer to service the encapsulated request away from the REST endpoint wrapping code or GWT action handler code. This allows multiple types of endpoint to use the same underlying service layer. If a request requires the use of multiple services to form a response, this must be handled within the service layer by the primary service which will be responsible for any such orchestration.
Service layer
The service layer applies permission constraints to any requests being made so that only calls from identified and permitted users are allowed to proceed. The service layer performs all orchestration and business logic, and is responsible for all mutations of objects that will be persisted by the underlying persistence layer such as stamping objects to be updated with the current user and update time.
The service layer provides implementations for any API that the component may have.
The service layer provides the DAO (Data Access Object) API for the persistence layer to implement but maintains no knowledge of underlying persistence implementation, e.g. database queries.
Persistence implementation, e.g. modules ending in -impl-db
Persistence layer - DAOs
The persistence layer is an implementation of one or more DAOs specified in the service layer. The persistence layer provides no logic, it just stores and retrieves objects in a database or other persistence technology. If serialisation/de-serialisation is required in order to persist the object then that should also be performed by this layer so that no code above this layer has to care about this implementation detail.
The persistence layer does not apply security or permissions checking so should not need to reference the security API.
4 - Contributing
TODO
- Feature branches
- Branch name
- Change log
- Checkstyle
- PR conventions
- Link to issue
5 - Release Process
5.1 - Releasing Stroom
Pre-requisites for a release
The follow need to be completed before a release is made.
Logging changes
Stroom and its related repositories all have a CHANGELOG.md
file for recording changes made between releases.
Before making a release you should ensure that all changes have been recorded in the CHANGELOG.
This is not done by directly editing the file but instead using the script ./log_change.sh
.
log_change creates change entry files in the directory ./unreleased_changes/
.
This prevents merge conflicts that would happen with multiple people editing the CHANGELOG file.
The following examples show you how to use the log_change script.
Commit and push all changes
Before releasing all local changes that you want in the release should be committed and pushed.
Commits that you want in a release should be merged down to a release branch, e.g. 7.0
or master
.
Once pushed and merged ensure that the branch passes the
CI build
.
Decide on the next version number
Stroom versioning follows Semantic Versioning .
Given a version number MAJOR.MINOR.PATCH:
- MAJOR is incremented when there are major or breaking changes.
- MINOR is incremented when functionality is added in a backwards compatible manner.
- PATCH is incremented when bugs are fixed.
Based on the changes since the last release establish if it is a major, minor or patch release to determine the next version number.
Performing a named release of Stroom
Once all the above pre-requisites have been met you can trigger the release by running this command:
This script will do the following:
- Adds the content of the unreleased change entry files (created by
log_change.sh
) to the CHANGELOG. - Prompts for (and suggests) the next version based on the previous release.
- Adds a new version heading to CHANGELOG.
- Adds/updates the version compare links in the CHANGELOG.
- Commits and pushes the change log changes.
- Creates an annotated git tag using the release version number and change entries.
The tagged git commit will trigger a CI build that includes additional release elements such as:
- Pushing the built docker images to DockerHub.
- Creating a release for the git tag in GitHub with all the release artefacts.
- Publishing any libraries to Sonatype and Maven Central.
Performing a named release of the docker stacks
Once the Stroom release build has finished and the artefacts are available on GitHub Releases and DockerHub you can create an associated release of the Stroom docker stacks.
In the following examples we will assume that you have just released Stroom v7.0.1
on branch 7.0
, and the previous release was v7.0.0
.
Checkout and pull the corresponding release branch in the stroom-resources repository.
Now edit the file bin/stack/container_versions.env
and edit the following line, setting it to the version of stroom you have just released:
STROOM_TAG="v7.0.0"
STROOM_TAG="v7.0.1"
If any of the other docker image versions need updating then do it at this point.
Now add/commit/push the change. Check the CI build is successful for the new Stroom image.
If the build is green then tag the stacks release as follows:
This script will build all the stack variants locally to ensure they will build successfully, though it does not test them. If the local build is successful it will then create an annotated git tag which will trigger a release CI build. The release CI build will create an archive for each stack variant and add them as a release artefacts.
SNAPSHOT releases
SNAPSHOT releases should not be released to Sonatype or Maven Central.
If a development version of a library needs to be shared between projects then you can either use the Gradle task publishToMavenLocal
to publish a SNAPSHOT
version to your local Maven repository and change your dependency version to SNAPSHOT
, or perform a named release along the lines of vx.y.z-alpha.n
.
Release Versioning conventions
Semantic versioning is used, and this should be adhered to, see SemVer . The following are examples of valid version names
SNAPSHOT
- Used only for local development, never to be published publicly.v3.3.0
- Initial release of v3.3, with an associated3.3
branch.v3.3.1
- A patch release to v3.3 on the3.3
branch.v3.4.0-alpha.1
- An alpha release of v3.4, either onmaster
or a3.4
branchv3.4.0-beta.1
- An beta release of v3.4, either onmaster
or a3.4
branch
To Perform a Local Build
6 - Setting up releases to Sonatype & Maven Central
Create a Sonatype account
You need to create an account on Sonatype and you will need to raise a jira ticket on Sonatype’s jira to get approved on the uk.gov.gchq group. This will require an existing user approved for the group to approve you on the ticket.
Setting up a GPG key
You can use the following commands for setting up a GPG2 key for signing.
Setting up the gradle build
The signing and release to Sonatype is done by various gradle plugins.
id "io.github.gradle-nexus.publish-plugin" version "1.0.0"
id "signing"
id "maven-publish"
See the root and event-logging-api gradle build files (in the event-logging repo) for an example of how to set up gradle.
The credentials can be passed to the gradle build using special gradle env vars Project Properties (external). The credentials required are:
ORG_GRADLE_PROJECT_SIGNINGKEY
- The key as produced by thegpg2 --armor
command.ORG_GRADLE_PROJECT_SIGNINGPASSWORD
- The password for the GPG key.ORG_GRADLE_PROJECT_SONATYPEUSERNAME
- The account username on Sonatype.ORG_GRADLE_PROJECT_SONATYPEPASSWORD
- The account password on Sonatype.
Setting up Github Actions
You will need to provide Github with the four secrets listed above by setting them as repository secrets at https://github.com/gchq/ORG_GRADLE_...
bit as the name.
So that the action can create the Github release you will also need to set up an SSH key pair and provide it with the public and private key. To generate the key pair do:
The key pair will be created in ~/.ssh/
.
Create a repo deploy key with the public key, named ‘Actions Deploy Key’ and with write access at https://github.com/