The page that you are currently viewing is for an old version of Stroom (7.1). The documentation for the latest version of Stroom (7.8) can be found using the version drop-down at the top of the screen or by clicking here.

Glossary

Glossary of common words and terms used in Stroom.

Account

Refers to a user account in Stroom’s internal Identity Provider .

API

Application Programming Interface. An interface that one system can present so other systems can use it to communicate. Stroom has a number of APIs, e.g. its many REST APIs and its /datafeed interface for data receipt.

API Key

A Token created by Stroom’s internal IDP for authenticating with the API . It is an encrypted string that contains details of the user and the expiration date of the token. Possession of a valid API Key for a user account means that you can do anything that the user can do in the user interface via the API. API Keys should therefore be protected carefully and treated like a password. If you are using an external IDP then tokens for use with the API are generated by the external IDP .

Byte Order Mark

A special Unicode character at the start of a text stream that indicates the byte order (or endianness) of the stream.

See Byte Order Mark for more detail.

Condition

A Condition in an query expression term, e.g. =, >, in, etc.

Content

Content in Stroom typically means the documents/entities created Stroom and as seen in the explorer tree. Content can be created/modified by Stroom users and imported/exported for sharing between different Stroom instances.

Dashboard

A Dashboard is a configurable entity for querying one or more Data Sources and displaying the results as a table, a visualisation or some other form.

See the User Guide for more detail.

Data Source

The source of data for a Query , e.g. a Lucene based Index , a SQL Statistics Data source, etc. There are three types of Data source:

Lucene based search index data sources.
Stroom’s SQL Statistics data sources.
Searchable data sources for searching the internals of Stroom.

A data source will have a DocRef to identify it and will define the set of Fields that it presents. Each Field will have:

A name
A set of Conditions that it supports. E.g. a Feed field would likely support is but not >.
A flag to indicate if it is queryable or not. I.e. a queryable field could be referenced in the query expression tree and in a Dashboard table, but a non-queryable field could only be referenced in the Dashboard table.

Data Splitter

Data Splitter is a pipeline element for converting text data (e.g. CSV, fixed width, delimited, multi-line) into XML for onward processing.

See the User Guide or the Pipeline Element Reference for more detail.

Dictionary

A entity for storing static content, e.g. lists of terms for use in a query with the in dictionary condition. They can also be used to hold arbitrary text for use in XSLTs with the dictionary()

DocRef

A DocRef is an identifier used to identify most documents/entities in Stroom, e.g. An XSLT will have a DocRef. It is comprised of the following parts:

UUID - A Universally Unique Identifier to uniquely identify the document/entity.
Type - The type of the document/entity, e.g. Index, XSLT, Dashboard, etc.
Name - The name given to the document/entity.

DocRefs are used heavily in the REST API for identifying the document/entity to be acted on.

Document

Typically refers to an item that can be created in the Explorer Tree, e.g. a Feed, a Pipeline, a Dashboard, etc. May also be known as an Entity .

Entity

Typically refers to an item that can be created in the Explorer Tree, e.g. a Feed, a Pipeline, a Dashboard, etc. May also be known as a Document .

Events

This is a Stream Type in Stroom. An Events stream consists of processed/cooked data that has been demarcated into individual Events. Typically in Stroom an Events stream will contain data conforming to the event-logging XML Schema which provides a normalised form for all Raw Events to be transformed into.

Explorer Tree

The left hand navigation tree. The Explorer Tree is used for finding, opening, creating, renaming, copying, moving and deleting Entities . It can also be used to control the access permissions of entities and folders. The tree can be filtered using the quick filter, see Finging Things for more details.

Expression Tree

A tree of expression terms that each evaluate to a boolean (True/False) value. Terms can be grouped together within an expression operator (AND, OR, NOT). For example:

AND (
  Feed is CSV_FEED
  Type = Raw Events
)

Expression Trees are used in Processor Filters and Query expressions.

Feed

A Feed is means of organising and categorising data in Stroom. A Feed contains multiple Streams of data that have been ingested into Stroom or output by a Pipeline . Typically a Feed will contain Streams of data that are all from one system and have a common data format.

Field

A named data Field within some form of record or entity, and where each Field can have an associated value. In Stroom Fields can be the Fields in an Index (or other queryable Datasource ) or the fields of Meta Data associated with a Stream , e.g. Stream ID, Feed , creation time, etc.

Filter (Processor)

See Processor Filter .

FQDN

Fully Qualified Domain Name, i.e. the hostname of a machine fully qualified with its full domain. For example com.somedomain.somehost.

Group (Users)

A named group of users to which application and document permissions can be assigned. Users can belong to multiple groups. Groups allow permissions to be assigned once and affect multiple users.

Identity Provider (IDP)

An Identity Provider is a system or service that can authenticate a user and assert their identity. Identity providers can support single sign on (SSO), which allows the user to sign in once to the Identity Provider so they are then authenticated to all systems using that IDP. Examples of identity providers are Google, Cognito, KeyCloack. Stroom has its own built in IDP or can be configured to use a 3rd party IDP.

Parser

A Parser is a Pipeline element for parsing Raw Events into a structured form. For example the Data Splitter Parser that parses text data into Records and Fields .

See the Pipeline Element Reference for details.

Pipeline

A Pipeline is an entity that is constructed to take a single input of stream data and process/transform it with one or more outputs. A Pipeline can have many elements within it to read, process or transform the data flowing through it.

See the User Guide for more detail.

Pipeline Element

An element within a Pipeline that performs some action on the data flowing through it.

See the Pipeline Element Reference for more detail.

Processor

A Processor belongs to a Pipeline . It controls the processing of data through its parent Pipeline. The Processor can be enabled/disabled to enable/disable the processing of data through the Pipeline. A processor will have one or more Processor Filters associated with it.

Processor Filter

A Processor Filter consists of an expression tree to select which Streams to process through its parent Pipeline . For example a typical Processor Filter would have an Expression Tree that selected all Streams of type Raw Events in a particular Feed . A filter could also select a single Stream by its ID, e.g. when Re-Processing a Stream. The filter is used to find Streams to process through the Pipeline associated with the Processor Filter. A Pipeline can have multiple Processor Filters. Filters can be enabled/disabled independently of their parent Processor to control processing.

Property

A configuration Property for configuring Stroom. Properties can be set via in the user interface or via the config.yml configuration file.

See Properties for more detail.

Query

The search Query in a Dashboard that selects the data to display. The Query is constructed using an Expression Tree of terms.

See the User Guide for more detail.

Raw Events

This is a Stream Type used for Streams received by Stroom. Streams received by Stroom will be in a variety of text formats (CSV, delimited, fixed width, XML, JSON, etc.). Until they have been processed by a pipeline they are essentially just unstructured character data with no concept of what is a record/event. A Parser in a pipeline is required to provide the demarcation between records/events.

Re-Processing

The act of repeating the processing of a set of input data ( Streams ) that have already been processed at least once. Re-Processing can be done for an individual Stream or multiple Streams using a Processor Filter .

Records

This is a Stream Type for Streams containing data conforming to the records:2 XML Schema . It also refers more generally to XML conforming to the records:2 XML Schema which is used in a number of places in Stroom, including as the output format for the DSParser and input for the IndexingFilter .

Search Extraction

The process of extracting un-indexed Field values from the source Event /Record to be used in search results.

See the User Guide for more detail.

Stream

A Stream is the unit of data that Stroom works with and will typically contain many Events .

See the User Guide for more detail.

Stream Type

All Streams must have an Stream Type. The list of Stream Types is configured using the Property stroom.data.meta.metaTypes. Additional Stream Types can be added however the list of Stream Types must include the following built-in types:

Context
Error
Events
Meta
Raw Events
Raw Reference
Reference

Some Stream Types, such as Meta and Context only exist as child streams within another Stream.

Stepper

The Stepper is a tool in Stroom for developing and debugging a Pipeline . It allows the user to simulate passing a Stream through a pipeline with the ability to step from one record/event to the next or to jump to records/events based on filter criteria. The parsers and translations can be edited while in the Stepper with the element output updating to show the effect of the change. The stepper will not write data to the file system or stream stores.

Token

Typically refers to an authentication token that may be used for user authentication or as an Api Key . Tokens are JSON Web Tokens (JWT) that set in the HTTP header Authorization with a value of the form Bearer TOKEN_GOES_HERE. Tokens are associated with a Stroom User so have the same or less permissions than that user. Tokens also have an expiry time after which they will no longer work.

Tracker

A Tracker is associated with a Processor Filter and keeps track of the Streams that the Processor Filter has already processed.

User

Refers to a Stroom User that is linked to either an Account in Stroom’s internal IDP or a user account in an external IDP . A Stroom User is only concerned with authorisation (i.e. application/document permissions and group memberships), and not authentication.

UUID

A Universally Unique Identifier for uniquely identifying something. UUIDs are used as the identifier in DocRefs . An example of a UUID is 4ffeb895-53c9-40d6-bf33-3ef025401ad3.

See the User Guide for more detail.

Visualisation

A document comprising some Javascript code for visualising data, e.g. pie charts, heat maps, line graphs etc. Visualisations are not baked into Stroom, they are content, so can be created/modified/shared by Stroom users.

Volume

In Stroom a Volume is a logical storage area that Stroom can write data to. Volumes are associated with a path on a file system that can either be local to the Stroom node or on a shared file system. Stroom has two types of Volume; Index Volumes and Data Volumes.

Index Volume - Where the Lucene Index Shards are written to. An Index Volume must belong to a Volume Group .
Data Volume - Where streams are written to. When writing Stream data Stroom will pick a data volume to using a volume selector as configured by the Property stroom.data.filesystemVolume.volumeSelector.

See the User Guide for more detail.

Volume Group

A Volume Group is a collection of one or more Index Volumes. Index volumes must belong to a volume group and Indexes are configured to write to a particular Volume Group. When Stroom is write data to a Volume Group it will choose which if the Volumes in the group to write to using a volume selector as configured by the Property stroom.volumes.volumeSelector.

See the User Guide for more detail.

XSLT

Extensible Stylesheet Language Transformations is a language for transforming XML documents into other XML documents. XSLTs are the primary means of transforming data in Stroom. All data is converted into a basic form of XML and then XSLTs are used to decorate and transform it into a common form. XSLTs are also used to transform XML Events data into non-XML forms or XML with a different schema for indexing, statistics or for sending to other systems.

See the User Guide for more detail.

Last modified May 6, 2025: Merge branch '7.0' into 7.1 (883e1d1)