Version 7.0

Key new features and changes present in v7.0 of Stroom and Stroom-Proxy.

For a detailed list of the changes in v7.0 see the

Integrated Authentication

The previously standalone (in v6) stroom-auth-service and stroom-auth-ui services have been integrated into the core stroom application. This simplifies the installation and configuration of stroom.

Configuration Properties Improvements

Configuration is now provided by YAML files on boot

Previously stroom used a flat .conf file to manage the application configuration. Application logging was configured either via a .yml file (in v6) or in an .xml file (in v5). Now stroom uses a single .yml file to configure the application and logging. This file is different to the .yml files(s) used in the docker compose configuration. The YAML file provides a more logical hierarchical structure and support for typed values (longs, doubles, maps, lists, etc.).

The YAML configuration is intended for configuration items that are either needed to bootstrap stroom or have values that are specific to a node. Cluster wide configuration properties are still stored in the database and managed via the UI.

There has been a change to the precedence of the configuration properties held in different locations (YAML, database, default) and this is described in Properties.

Stroom Home and relative paths

The concept of Stroom Home has been introduced. Stroom Home allows for one path to be configured and for all other configurable paths to default to being a child of this path. This keeps all configured directories in one place by default. Each configured directory can be set to an absolute path if a location outside Stroom Home is required. If a relative path is used it will be relative to Stroom Home. Stroom Home can be configured with the property stroom.path.home.

Improved Properties UI screens that tell you the values over the cluster

Previously the Properties UI screens could only tell you the values held within the database and not the value that a node was actually using. The Properties screens have been improved to tell you the source of a property value and where multiple values exist across the cluster, which nodes have what values. See Properties.

Validation of Configuration Property Values

Validation of configuration property values is now possible. The validation rules are defined in the application code and allow for things like:

Ensuring that a regex pattern is a valid pattern
Setting maximum or minimum values to numeric properties.
Ensuring a property has a value.

Validation will be enforced on application boot or when a value is edited via the UI.

Hot Loading of Node Configuration

Now that node specific configuration is managed via the YAML configuration file stroom will detect changes to this file and update the configuration properties accordingly. Some properties however do not support being changed at runtime so will still require either the whole system or the UI nodes to be restarted.

Data retention impact summary

The Data_Retention screen now provides an Impact Summary tab that will show you a summary of what will be deleted by the current active rules. The summary is based on the rules as they currently are in the UI, so it allows you to see the impact before saving rule changes. The summary is a count of the number of streams that will be deleted by each rule, broken down by feed and stream type. In very large systems with a lot of data or where complex rules are in place the summary may take a some time (minutes) to produce.

See Data Retention for more details.

Fuzzy Finding in Quick Filters and Suggestion Text Fields

A richer fuzzy find algorithm has been added to the Quick filter search fields. It has also been added to some text input fields with suggestion fields, e.g. Feed Name input fields. This makes finding values or rows in a table faster and more precise.

See Finding Things for more details.

New (off-heap) memory efficient reference data

The reference data feature in previous versions of stroom loaded the reference data on demand and held it in Java’s heap memory. In large systems or where a pipeline doing reference data lookups across a wide time range this can lead to very large heap sizes.

In v7 stroom now uses an off-heap, disk backed store (LMDB) for the reference data. This removes all (with the exception of context lookups) from the Java heap, so the -Xmx value can be reduced. In large systems this can mean keeping your -Xmx value below the 32Gb threshold to further reduce the memory usage. Because the store is disk backed frequently used reference data can be kept in the store to reduce the loading overhead. As the reference data is held off-heap it stroom can make use of all available free RAM for the reference data.

See Reference Data

Reference Data API

A RESTful API has been added for the reference data store. This primarily allows reference lookups to be performed by external systems.

See Reference Data API

Text editor improvements

The Ace text editor is used widely in Stroom for such things as editing XSLTs, editing dashboard column expressions, viewing stream data and stepping. There have been a number of improvements to this editor.

See Editing and Viewing Text Data

Additional options have been added to the context menu in the text editor:

Toggle soft line wrapping of long lines.
Toggle viewing hidden characters, e.g. tabs, spaces, line breaks.
Toggle Vim key bindings. The Ace editor does not implement all Vim functionality but supports the core key bindings.
Toggle auto-completion. Completion is triggered using ctrl+space.
Toggle live auto-completion. Completion is triggered as you type.
Toggle the inclusion of snippets in the auto-complete suggestions.

Auto-completion and snippets

Most editor screens now support basic auto-completion of existing words found in the text. Some editor screens, such as XSLT, dashboard column expressions and Javascript scripts also support keyword and snippet completion.

Data viewing improvements

The way data is viewed in Stroom has changed to improve the viewing of large files or files with no line breaks. Previously a set number of lines of data would be fetched for display on the page in the Data Viewer. This did not work for data that has no line breaks as Stroom would then try to fetch all data.

In v7 Stroom works at the character level so can fetch a reasonable number of characters for display whether they are all one line or spread over multiple lines.

The viewing of data has been separated into two mechanisms, Data Preview and Source View.

See Editing and Viewing Text Data

Data Preview

This is the default view of the data. It displays the first n characters (configurable) of the data. It will attempt the format the data, e.g. showing pretty-printed XML. You cannot navigate around the data.

Source View

This view is intended for seeing the actual data in its raw un-formatted form and for navigating around it. This view provides navigation controls to define the range of data being display, e.g. from a character offset, line number or line and column.

You can now query data, server tasks and processing tasks on dashboards

TODO

Complete this section

Data actions such as delete, download, reprocess now provide an impact summary before proceeding.

TODO

Complete this section

Index volume groups for easier index volume assignment

TODO

Complete this section

Kafka Integration

New Kafka Configuration Entity

Integration with Apache Kafka was introduced in v6 however the way the connection to Kafka cluster(s) is configured has been improved. We have introduced a new entity type called Kafka Configuration that can be created/managed via the explorer tree. This means stroom can integrate with many Kafka clusters or connect to a cluster using different sets of Kafka Configuration properties. The Kafka Configuration entity provides an editor for setting all the Kafka specific configuration properties. Pipeline elements that use Kafka now provide a means to select the Kafka Configuration to use.

TODO

Add user guide section on Kafka configuration

An Improved Pipeline Element for Sending Data to Kafka

The previous Kafka pipeline elements in v6 have been replaced with a single StandardKafkaProducer element. The new element allows for the dynamic construction of a Kafka Producer message via an XML document conforming to the kafka-records XmlSchema. With this new element events can be translated into kafka records which will be then given to the Kafka Producer to send to the Kafka Cluster. This allows for complete control of things like timestamps, topics, keys, values, etc.

TODO

Add user guide section on Kafka Standard Producer

No limitations on data reprocessing

TODO

Complete this section

Improved REST API

A rich REST API for all UI accessible functions

The architecture of the stroom UI has been changed such that all communication between the UI and the back end is via REST calls. This means all of these REST calls are available as an API for users of stroom to take advantage of. It opens up the possibility for interacting with stoom via scripts or from other applications.

Swagger UI to document REST API methods

The Swagger UI and specification file have been improved to include all of the API methods available in stroom.

Improved architecture with separate modules with individual DB access to spread load.

The architecture of the core stroom application has been fundamentally changed in v7 to internally break up the application into its functional areas. This separation makes for a more logical code base and allows for the possibility of each functional area having its own database instance, if required.

Java 12

stroom v7 now runs on the Java 12 JVM.

MySQL 8 support.

stroom v7 has been changed to support MySQL v8, opening up the possibility of using features like group replication.

Last modified November 1, 2024: Merge branch '7.3' into 7.4 (98246aa)

Version 7.0

Integrated Authentication

Configuration Properties Improvements

Configuration is now provided by YAML files on boot

Stroom Home and relative paths

Improved Properties UI screens that tell you the values over the cluster

Validation of Configuration Property Values

Hot Loading of Node Configuration

Data retention impact summary

Fuzzy Finding in Quick Filters and Suggestion Text Fields

New (off-heap) memory efficient reference data

Reference Data API

Text editor improvements

Editor context menu

Auto-completion and snippets

Data viewing improvements

Data Preview

Source View

You can now query data, server tasks and processing tasks on dashboards

TODO

Data actions such as delete, download, reprocess now provide an impact summary before proceeding.

TODO

Index volume groups for easier index volume assignment

TODO

Kafka Integration

New Kafka Configuration Entity

TODO

An Improved Pipeline Element for Sending Data to Kafka

TODO

No limitations on data reprocessing

TODO

Improved REST API

A rich REST API for all UI accessible functions

Swagger UI to document REST API methods

Improved architecture with separate modules with individual DB access to spread load.

Java 12

MySQL 8 support.