This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Glossary

A glossary of common terms used in this documentation.

1 - A

1.1 - Account

Refers to a user account in Stroom’s internal Identity Provider.

1.2 - API

Application Programming Interface. An interface that one system can present so other systems can use it to communicate. Stroom has a number of APIs, e.g. its many REST APIs and its /datafeed interface for data receipt.

1.3 - API Key

API Keys are a form of authentication token that are created within Stroom for use by Stroom-Proxy instances or other clients that want to use Stroom’s API. It is an encrypted string that contains details of the user and the expiration date of the token. Possession of a valid API Key for a user account means that you can do anything that the user can do in the user interface via the API.

API Keys should therefore be protected carefully and treated like a password. If you are using an external Identity Provider Identity Provider (IDP) An Identity Provider is a system or service that can authenticate a user and assert their identity. Identity providers can support single sign on (SSO), which allows the user to sign in once to the Identity Provider so they are then authenticated to all systems using that IDP.Click to see more details... then tokens for use with the Identity Provider Identity Provider (IDP) An Identity Provider is a system or service that can authenticate a user and assert their identity. Identity providers can support single sign on (SSO), which allows the user to sign in once to the Identity Provider so they are then authenticated to all systems using that IDP.Click to see more details... are generated by the external Identity Provider Identity Provider (IDP) An Identity Provider is a system or service that can authenticate a user and assert their identity. Identity providers can support single sign on (SSO), which allows the user to sign in once to the Identity Provider so they are then authenticated to all systems using that IDP.Click to see more details....

1.4 - Application permission

This is a permission that is not specific to a single document. It applies to all documents or is not related to documents in any way.

Application permissions are generally associated with a screen or functional area of the Stroom application. A lot of the application permissions tend to be more applicable to system administrators but allow fine grained control of the different functional areas in Stroom so these functions can be devolved to other users.

Examples of application permissions are Manage Users, Pipeline Stepping and Data - View.

2 - B

2.1 - Byte order mark

A special Unicode character at the start of a text stream that indicates the byte order (or endianness) of the stream.

3 - C

3.1 - Character encoding

Character Encoding is the means of encoding character data (i.e. text) into binary form. Therefore to decode character data from a stream of bytes, the character encoding must be known (or guessed).

Common examples of character encodings are ASCII, UTF-8 and UTF-16.

Each Feed Feed A Feed is a means of organising and categorising data in Stroom. A Feed contains multiple Streams of data that have been ingested into Stroom or output by a Pipeline. Typically a Feed will contain Streams of data that are all from one system and have a common data format.Click to see more details... has a defined character encoding for the data and context Context data This is an additional stream of contextual data that is sent along side the main event stream. It provides a means for the sending system to send additional data that relates only to the event stream it is sent alongside.Click to see more details.... This allows Stroom to decode the data sent into that Feed.

3.2 - Condition

A Condition in an query expression term, e.g. =, >, in, etc.

3.3 - Content

Content in Stroom typically means the user created documents/entities created in Stroom and as seen in the explorer tree. Content can be created/modified by Stroom users and imported/exported for sharing between different Stroom instances.

3.4 - Context data

This is an additional stream of contextual data that is sent along side the main event stream. It provides a means for the sending system to send additional data that relates only to the event stream it is sent alongside.

This can be useful where the sending system has no control over the data in the event stream and the event stream does not contain contextual information such as what machine it is running on or the location of that machine.

The contextual information (such as hostname, FQDN, physical location, etc.) can be sent in a Context Stream so that the two can be combined together during pipeline processing using stroom:lookup().

3.5 - Cron

Cron is a command line utility found on most linux/unix systems that is used for scheduling background tasks. Cron expressions (or variants of them) are widely used in other schedulers.

Stroom uses a scheduler called Quartz which supports cron expressions for scheduling. The full details of the cron syntax supported by Quartz can be found here .

3.6 - CSV

Comma Separated Values is a file format with typically one record per line and fields delimited by a ,. Fields may be optionally enclosed with double quotes, though there is no fixed standard for CSV data, particularly when it comes to escaping of double quotes and/or commas.

4 - D

4.1 - Dashboard

A Dashboard is a configurable entity for querying one or more Data Sources and displaying the results as a table, a visualisation or some other form.

4.2 - Data source

The source of data for a Query, e.g. a Lucene based Index, a SQL Statistics Data source, etc.

There are three types of Data source:

  • Lucene based search index data sources.
  • Stroom’s SQL Statistics data sources.
  • Searchable data sources for searching the internals of Stroom.

A data source will have a Doc Ref Doc Ref A Doc Ref (or Document Reference) is an identifier used to identify most documents/entities in Stroom, e.g. an XSLT will have a Doc Ref.Click to see more details... to identify it and will define the set of Fields Field A named data Field within some form of record or entity, and where each Field can have an associated value. In Stroom, Fields can be the Fields in an Index (or other queryable Data Source or the fields of Metadata associated with a Stream, e.g. Stream ID, Feed, creation time, etc.Click to see more details... that it presents. Each Field will have:

  • A name
  • A set of Conditions Condition A Condition in an query expression term, e.g. =, >, in, etc.Click to see more details... that it supports. E.g. a Feed field would likely support is but not >.
  • A flag to indicate if it is queryable or not. I.e. a queryable field could be referenced in the query expression tree and in a Dashboard Dashboard A Dashboard is a configurable entity for querying one or more Data Sources and displaying the results as a table, a visualisation or some other form.Click to see more details... table, but a non-queryable field could only be referenced in the Dashboard table.

4.3 - Data splitter

Data Splitter is a pipeline element for converting text data (e.g. CSV, fixed width, delimited, multi-line) into XML for onward processing.

4.4 - Dictionary

A entity for storing static content, e.g. lists of terms for use in a query with the in dictionary condition. They can also be used to hold arbitrary text for use in XSLT with the dictionary function.

4.5 - Doc Ref

A Doc Ref (or Document Reference) is an identifier used to identify most documents/entities in Stroom, e.g. an XSLT will have a Doc Ref.

It is comprised of the following parts:

  • UUID UUID A Universally Unique Identifier for uniquely identifying something. UUIDs are used as the identifier in Doc Refs. An example of a UUID is 4ffeb895-53c9-40d6-bf33-3ef025401ad3.Click to see more details... - A Universally Unique Identifier to uniquely identify the document/entity.
  • Type - The type of the document/entity, e.g. Index, XSLT, Dashboard, etc.
  • Name - The name given to the document/entity.

Doc Refs are used heavily in the REST API for identifying the document/entity to be acted on.

4.6 - Document

Typically refers to an item that can be created in the Explorer Tree, e.g. a Feed, a Pipeline, a Dashboard, etc. May also be known as an Entity.

4.7 - Document permission

Document permissions control the access that users and/or groups have to a Document.

5 - E

5.1 - Elasticsearch

Elasticsearch is an Open Source and commercial search index product. Stroom can be connected to one or more Elasticsearch clusters so that event indexing and search is handled by Elasticsearch rather than internally.

5.2 - ELFF

The Extended Log File Format. A W3C standard format for log files produced by web servers.

5.3 - Entity

Typically refers to an item that can be created in the Explorer Tree, e.g. a Feed, a Pipeline, a Dashboard, etc. May also be known as a Document.

5.4 - Event

An event is a single auditable event, e.g. a user logging in to a system. A Stream typically contains multiple events.

In a Raw Events Raw Events This is a Stream Type used for Streams received by Stroom. Streams received by Stroom will be in a variety of text formats (CSV, delimited, fixed width, XML, JSON, etc.). Until they have been processed by a pipeline they are essentially just unstructured character data with no concept of what is a record/event. A Parser in a pipeline is required to provide the demarcation between records/events.Click to see more details... an event is typically represented as block of XML or JSON, a single line for CSV data. In an Events Events This is a Stream Type in Stroom. An Events stream consists of processed/cooked data that has been demarcated into individual Events.Click to see more details... Stream Stream A Stream is the unit of data that Stroom works with and will typically contain many Events.Click to see more details... an event is identified by its Event ID which its position in that stream (as a one-based number). The Event ID combined with a Stream ID provide a unique identifier for an event within a Stroom instance.

5.5 - Events

This is a Stream Type in Stroom. An Events stream consists of processed/cooked data that has been demarcated into individual Events.

Typically in Stroom an Events stream will contain data conforming to the event-logging XML Schema which provides a normalised form for all Raw Events Raw Events This is a Stream Type used for Streams received by Stroom. Streams received by Stroom will be in a variety of text formats (CSV, delimited, fixed width, XML, JSON, etc.). Until they have been processed by a pipeline they are essentially just unstructured character data with no concept of what is a record/event. A Parser in a pipeline is required to provide the demarcation between records/events.Click to see more details... to be transformed into.

5.6 - Explorer tree

The left hand navigation tree. The Explorer Tree is used for finding, opening, creating, renaming, copying, moving and deleting Documents.

It can also be used to control the access permissions of entities and folders. The tree can be filtered using the quick filter, see Finding Things for more details.

5.7 - Expression tree

A tree of expression terms that each evaluate to a boolean (True/False) value. Terms can be grouped together within an expression operator (AND, OR, NOT).

For example:

AND (
  Feed is CSV_FEED
  Type = Raw Events
)

Expression Trees are used in Processor Filters Processor filter A Processor Filter is used to used to find Streams to process through the Pipeline associated with the Processor Filter. A Processor Filter consists of an expression tree to select which Streams to process and a tracker to track the what Streams have been processed.Click to see more details... and Query Query The search Query in a Dashboard that selects the data to display. The Query is constructed using an Expression Tree of terms.Click to see more details... expressions.

6 - F

6.1 - Feed

A Feed is a means of organising and categorising data in Stroom. A Feed contains multiple Streams of data that have been ingested into Stroom or output by a Pipeline. Typically a Feed will contain Streams of data that are all from one system and have a common data format.

6.2 - Field

A named data Field within some form of record or entity, and where each Field can have an associated value. In Stroom, Fields can be the Fields in an Index (or other queryable Data Source or the fields of Metadata associated with a Stream, e.g. Stream ID, Feed, creation time, etc.

6.3 - Filter

A Filter may refer to a Processor Filter or a Filter Element in a Pipeline.

6.4 - Fully Qualified Domain Name (FQDN)

The Fully Qualified Domain Name (FQDN) is the complete, unambiguous address of a device or service on the internet, specifying all domain levels including the hostname, domain name, and top-level domain. For example server57.some.domain.com.

7 - G

7.1 - Git

Git is a free and open source distributed version control system. It is used for controlling, organizing, and tracking different versions in history of computer files, typically text files but also any other type of file. It allows all changes made to a file to be viewed and tracked over time and for branching/merging of the repository for separate strands of work.

The source code for the Stroom software is stored in a Git repository. Stroom also uses Git for managing user content that is held in one or more Git repositories.

7.2 - Group (users)

A named group of users to which application and document permissions can be assigned. Users can belong to multiple groups. A Group can belong to multiple groups. Groups allow permissions to be assigned to the group such that members of that group inherit those permissions.

8 - H

9 - I

9.1 - Identity Provider (IDP)

An Identity Provider is a system or service that can authenticate a user and assert their identity. Identity providers can support single sign on (SSO), which allows the user to sign in once to the Identity Provider so they are then authenticated to all systems using that IDP.

Examples of identity providers are Google, Cognito, Keycloak and Microsoft Azure/Entra AD. Stroom has its own built in IDP or can be configured to use a 3rd party IDP.

9.2 - Index

A Data Source that is backed by a Lucene based search index.

9.3 - IP address

The Internet Protocol (IP) address, e.g. 192.168.0.1. Typically an IP address is assumed to be an IPv4 address.

9.4 - ISO 8601

This is an international standard for representing dates, times and durations. By default Stroom displays date/times in ISO 8601.

Valid examples of ISO 8601 dates/times are:

2010-01-01T23:59Z
2010-01-01T23:59:59Z
2010-01-01T23:59:59.123Z
2010-01-01T23:59:59+02:00
2010-01-01T23:59:59.123+02

10 - J

10.1 - JAR

Java Archive is a file format for distributing Java class files, associated metadata and resource files. It is a compressed archive based on the {{< glossary “ZIP” >}} format, so can be inspected with any tool capable of reading a ZIP file. Stroom and Stroom-Proxy are distributed as JAR files.

10.2 - JSON

JavaScript Object Notation is a file/data format for storing/transmitting structured data. It has similarities to XML, is less verbose, but is more simplistic. Stroom accepts data in JSON format and can output to JSON.

11 - K

12 - L

13 - M

13.1 - Markdown

Markdown is a simple markup language for creating rich formatted text using a text editor. Due to the simplicity of the Markdown it is still very readable in its raw form that contains the markup. Markdown is used in Stroom on the Documentation tab of each Document type and in the Documentation Document type.

Stroom uses the Showdown markdown converter to render users’ markdown content into formatted text.

13.2 - Metadata

Metadata refers to the data that describes the Stream data. It is sometimes referred to as just Meta.

14 - N

14.1 - Namespace

In Stroom Namespace typically refers to an XML Namespace. Namespaces are used in XML to distinguish different elements, e.g. where an XSLT is transforming XML in the records:2 Namespace into XML in the event-logging:3 Namespace.

An XSLT will define short aliases for Namespaces to make them easier to reference within the XSLT document. For example, in this snippet of an XML document, the aliases are: stroom, evt, xsl, xsi.

<xsl:stylesheet
  xmlns="event-logging:3"
  xpath-default-namespace="records:2"
  xmlns:stroom="stroom"
  xmlns:evt="event-logging:3"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  version="2.0">

15 - O

16 - P

16.1 - Parser

A Parser is a Pipeline element for parsing Raw Events into a structured form. For example the Data Splitter Parser that parses text data into Records and Fields.

16.2 - Pipeline

A Pipeline is an entity that is constructed to take a single input of stream data and process/transform it with one or more outputs. A Pipeline can have many elements within it to read, process or transform the data flowing through it.

16.3 - Pipeline element

An element within a Pipeline that performs some action on the data flowing through it.

16.4 - Processor

A Processor belongs to a Pipeline. It controls the processing of data through its parent Pipeline using one or more Processor Filters.

The Processor can be enabled/disabled to enable/disable the processing of data through the Pipeline. A processor will have one or more Processor Filters Processor filter A Processor Filter is used to used to find Streams to process through the Pipeline associated with the Processor Filter. A Processor Filter consists of an expression tree to select which Streams to process and a tracker to track the what Streams have been processed.Click to see more details... associated with it.

16.5 - Processor filter

A Processor Filter is used to used to find Streams to process through the Pipeline associated with the Processor Filter. A Processor Filter consists of an expression tree to select which Streams to process and a tracker to track the what Streams have been processed.

For example a typical Processor Filter would have an Expression Tree that selected all Streams of type Raw Events in a particular Feed Feed A Feed is a means of organising and categorising data in Stroom. A Feed contains multiple Streams of data that have been ingested into Stroom or output by a Pipeline. Typically a Feed will contain Streams of data that are all from one system and have a common data format.Click to see more details.... A filter could also select a single Stream by its ID, e.g. when Re-processing Re-processing The act of repeating the processing of a set of input data (Stream) that have already been processed at least once. Re-Processing can be done for an individual Stream or multiple Streams using a Processor Filter.Click to see more details... a Stream.

A Pipeline can have multiple Processor Filters. Filters can be enabled/disabled independently of their parent Processor to control processing.

16.6 - Property

A configuration Property for configuring Stroom. Properties can be set in the user interface or via the config.yml configuration file.

17 - Q

17.1 - Query

The search Query in a Dashboard that selects the data to display. The Query is constructed using an Expression Tree of terms.

18 - R

18.1 - Raw Events

This is a Stream Type used for Streams received by Stroom. Streams received by Stroom will be in a variety of text formats (CSV, delimited, fixed width, XML, JSON, etc.). Until they have been processed by a pipeline they are essentially just unstructured character data with no concept of what is a record/event. A Parser in a pipeline is required to provide the demarcation between records/events.

18.2 - Re-processing

The act of repeating the processing of a set of input data (Stream) that have already been processed at least once. Re-Processing can be done for an individual Stream or multiple Streams using a Processor Filter.

18.3 - Records

This is a Stream Type for Streams containing data conforming to the records:2 XML Schema. It also refers more generally to any XML conforming to the records:2 XML Schema which is used in a number of places in Stroom, including as the output format for the DSParser and input for the IndexingFilter.

18.4 - REST

REST (Representational State Transfer) is essentially an architectural style that dictates how data should be handled and “transferred” across a network. REST APIs typically use JSON to send data between the client and the server, and the HTTP methods GET, PUT, PATCH, POST and DELETE.

19 - S

19.1 - Search extraction

The process of extracting un-indexed Field values from the source Event to be used in search results.

19.2 - Searchable

A Searchable is the term given the special searchable data sources that appear at the root of the explorer tree picker when selecting a data source. These data sources are special internal data sources that are not user managed content, unlike an Index. They provide the means to search various aspects of Stroom’s internals, such as the Meta Store or Processor Tasks.

19.3 - Stepper

The Stepper is a tool in Stroom for developing and debugging a Pipeline. It allows the user to simulate passing a Stream through a pipeline with the ability to step from one record/event to the next or to jump to records/events based on filter criteria.

The parsers and translations can be edited while in the Stepper with the element output updating to show the effect of the change. The stepper will not write data to the file system or stream stores.

19.4 - Stream

A Stream is the unit of data that Stroom works with and will typically contain many Events.

19.5 - Stream Type

All Streams must have a Stream Type. The list of Stream Types is configured using the Property stroom.data.meta.metaTypes.

Additional Stream Types can be added however the list of Stream Types must include the following built-in types:

  • Context
  • Error
  • Events Events This is a Stream Type in Stroom. An Events stream consists of processed/cooked data that has been demarcated into individual Events.Click to see more details...
  • Meta
  • Raw Events Raw Events This is a Stream Type used for Streams received by Stroom. Streams received by Stroom will be in a variety of text formats (CSV, delimited, fixed width, XML, JSON, etc.). Until they have been processed by a pipeline they are essentially just unstructured character data with no concept of what is a record/event. A Parser in a pipeline is required to provide the demarcation between records/events.Click to see more details...
  • Raw Reference
  • Reference

Some Stream Types, such as Meta and Context only exist as child streams within another Stream.

19.6 - StroomQl

Stroom Query Language is Stroom’s own query language. It has similarities with Structured Query Language (SQL) as used in databases. StroomQL is sometimes referred to as sQL to distinguish it from SQL.

20 - T

20.1 - Table

A Table is the tabular part of a Dashboard or Query that contains the data.

20.2 - Transport Sayer Security (TLS)

Transport Sayer Security (TLS) is the evolution of Secure Sockets Layer (SSL) and refers to the encryption of traffic between client and server.

TLS is typically used in Stroom for communications between Stroom-Proxy and Stroom, between Stroom nodes and when communicating with external systems (e.g. an Elasticsearch cluster of a HttpPostFilter destination).

20.3 - Token

Typically refers to an authentication token that may be used for user authentication. A Stroom API Key is a form of authentication token.

Tokens are generally set in the HTTP header Authorization with a value of the form Bearer TOKEN_GOES_HERE. Tokens may contain information, e.g. a JSON Web Tokens (JWT) or simply be long strings of random characters (to essentially make a very secure password), like API Keys.

Tokens are associated with a Stroom User so have the same or less permissions than that user. Tokens also typically have an expiry time after which they will no longer work.

20.4 - Tracker

A Tracker is associated with a Processor Filter and keeps track of the Stream that the Processor Filter has already processed.

21 - U

21.1 - Unix Epoch

The Unix epoch is 00:00:00 UTC on 1st January 1970. Some timestamps in Stroom are represented as the number of milliseconds since the Unix epoch, e.g. 1738331628276, and may be referred to as epoch ms or epoch milliseconds.

21.2 - User

Refers to a Stroom User that is linked to either an Account in Stroom’s internal Identity Provider or a user account in an external Identity Provider. A Stroom User is only concerned with authorisation (i.e. application/document permissions and group memberships), and not authentication.

21.3 - Coordinated Universal Time (UTC)

Coordinated Universal Time (UTC), also known as Zulu time, is the international standard by which the world regulates clocks and time. It is essentially a successor to Greenwich Mean Time (GMT). UTC has the time zone offset of +00:00 and does not change for daylight saving. All international time zones are relative to UTC.

Stroom currently works internally in UTC, though it is possible to change the display time zone via User Preferences to display times in another time zone.

21.4 - UUID

A Universally Unique Identifier for uniquely identifying something. UUIDs are used as the identifier in Doc Refs. An example of a UUID is 4ffeb895-53c9-40d6-bf33-3ef025401ad3.

22 - V

22.1 - Visualisation

A document comprising some Javascript code for visualising data, e.g. pie charts, heat maps, line graphs etc. Visualisations are not baked into Stroom, they are content, so can be created/modified/shared by Stroom users.

22.2 - Volume

In Stroom a Volume is a logical storage area that Stroom can write data to. Volumes are associated with a path on a file system that can either be local to the Stroom node or on a shared file system.

Stroom has two types of Volume; Index Volumes and Data Volumes.

  • Index Volume - Where the Lucene Index Shards are written to. An Index Volume must belong to a Volume group Volume group A Volume Group is a collection of one or more Index Volumes. Index volumes must belong to a volume group and Indexes are configured to write to a particular Volume Group.Click to see more details....
  • Data Volume - Where streams are written to. When writing Stream Stream A Stream is the unit of data that Stroom works with and will typically contain many Events.Click to see more details... data Stroom will pick a data volume using a volume selector as configured by the Property Property A configuration Property for configuring Stroom. Properties can be set in the user interface or via the config.yml configuration file.Click to see more details... stroom.data.filesystemVolume.volumeSelector.

22.3 - Volume group

A Volume Group is a collection of one or more Index Volumes. Index volumes must belong to a volume group and Indexes are configured to write to a particular Volume Group.

When Stroom is writing data to a Volume Group it will choose which of the Volumes in the group to write to using a volume selector as configured by the Property Property A configuration Property for configuring Stroom. Properties can be set in the user interface or via the config.yml configuration file.Click to see more details... stroom.volumes.volumeSelector.

23 - W

24 - X

24.1 - XML

Extensible Markup Language is a markup language for storing/transmitting structured data. It is the working format for most Pipeline processing in Stroom and is the standard normalised format for event data.

24.2 - XML Schema

XML Schema is a language used to define the permitted structure of an XML document. An XML Schema can be used to validate an XML document to ensure it conforms to that schema such that onward processing of the XML document can be done with confidence that the document is correct.

The event-logging XML Schema is an example of an XML Schema.

24.3 - XPath

XPath is an expression language for selecting a node or nodes in an XML document. It is used heavily in XSLT to define the match criteria for templates and to select values.

24.4 - XSLT

Extensible Stylesheet Language Transformations is a language for transforming XML documents into other XML documents. XSLTs are the primary means of transforming data in Stroom.

All data is converted into a basic form of XML and then XSLTs are used to decorate and transform it into a common form. XSLTs are also used to transform XML Events Events This is a Stream Type in Stroom. An Events stream consists of processed/cooked data that has been demarcated into individual Events.Click to see more details... data into non-XML forms or XML with a different schema for indexing, statistics or for sending to other systems.

25 - Y

25.1 - YAML

YAML Ain’t Markup Language. A human readable data format often used for configuration files. YAML is used in stroom for various things, e.g. Stroom & Stroom Proxy’s main configuration file, Content Store definition files. YAML files will typically have the file extension .yaml or .yml.

26 - Z

26.1 - ZIP

A compressed file format for storing a one or more files with an associated directory structure. Stroom and Stroom Proxy use the ZIP format for exporting content and data as well as its Proxy ZIP format for holding multiple streams of data with associated meta data.