Glossary

Glossary of common words and terms used in Stroom.

Account

Refers to a user account in Stroom’s internal Identity Provider .

See User Accounts for more detail.

API

Application Programming Interface. An interface that one system can present so other systems can use it to communicate. Stroom has a number of APIs, e.g. its many REST APIs and its /datafeed interface for data receipt.

API Key

A Token created by Stroom’s internal Identity Provider for authenticating with the API . It is an encrypted string that contains details of the user and the expiration date of the token. Possession of a valid API Key for a user account means that you can do anything that the user can do in the user interface via the API. API Keys should therefore be protected carefully and treated like a password. If you are using an external Identity Provider then tokens for use with the Identity Provider are generated by the external Identity Provider .

Application Permission

This is a permission that is not specific to a single document. It applies to all documents or is not related to documents in any way.

Application permissions are generally associated with a screen or functional area of the Stroom application. A lot of the application permissions tend to be more applicable to system administrator but allow fine grained control of the different functional areas in Stroom so these functions can be devolved to other users. Examples of application permissions are Manage Users, Pipeline Stepping and Data - View.

See Application Permissions for more detail.

Byte Order Mark

A special Unicode character at the start of a text stream that indicates the byte order (or endianness) of the stream.

See Byte Order Mark for more detail.

Character Encoding

Character encoding is the means of encoding character data (i.e. text) into binary form. Therefore to decode character date from a stream of bytes, the character encoding must be known (or guessed).

Common examples of character encodings are ASCII, UTF-8 and UTF-16.

Each feed has a defined character encoding for the data and context . This allows Stroom to decode the data sent into that Feed.

Condition

A Condition in an query expression term, e.g. =, >, in, etc.

Content

Content in Stroom typically means the documents/entities created Stroom and as seen in the explorer tree. Content can be created/modified by Stroom users and imported/exported for sharing between different Stroom instances.

Context Data

This is an additional stream of contextual data that is sent along side the main event stream. It provides a means for the sending system to send additional data that relates only to the event stream it is sent alongside. This can be useful where the sending system has no control over the data in the event stream and the event stream does contain contextual information such as what machine it is running on or the location of that machine.

The contextual information (such as hostname, FQDN, physical location, etc.) can be sent in a Context Stream so that the two can be combined together during pipeline processing using stroom:lookup().

See Context Data, Stream Concepts and stroom:lookup() for more details.

Cron

cron is a command line utility found on most linux/unix systems that is used for scheduling background tasks. Cron expressions (or variants of them) are widely used in other schedulers.

Stroom uses a scheduler called Quartz which supports cron expressions for scheduling. The full details of the cron syntax supported by Quartz can be found here .

See Cron Syntax for more detail.

CSV

Comma Separated Values is a file format with typically one record per line and fields delimited by a ,. Field may be optionally enclosed with double quotes, though there is no fixed standard for CSV data, particularly when it comes to escaping of double quotes and/or commas.

Dashboard

A Dashboard is a configurable entity for querying one or more Data Sources and displaying the results as a table, a visualisation or some other form.

See the User Guide for more detail.

Data Source

The source of data for a Query , e.g. a Lucene based Index , a SQL Statistics Data source, etc. There are three types of Data source:

  • Lucene based search index data sources.
  • Stroom’s SQL Statistics data sources.
  • Searchable data sources for searching the internals of Stroom.

A data source will have a DocRef to identify it and will define the set of Fields that it presents. Each Field will have:

  • A name
  • A set of Conditions that it supports. E.g. a Feed field would likely support is but not >.
  • A flag to indicate if it is queryable or not. I.e. a queryable field could be referenced in the query expression tree and in a Dashboard table, but a non-queryable field could only be referenced in the Dashboard table.

Data Splitter

Data Splitter is a pipeline element for converting text data (e.g. CSV, fixed width, delimited, multi-line) into XML for onward processing.

See the User Guide or the Pipeline Element Reference for more detail.

Dictionary

A entity for storing static content, e.g. lists of terms for use in a query with the in dictionary condition. They can also be used to hold arbitrary text for use in XSLTs with the dictionary()

DocRef

A DocRef is an identifier used to identify most documents/entities in Stroom, e.g. An XSLT will have a DocRef. It is comprised of the following parts:

  • UUID - A Universally Unique Identifier to uniquely identify the document/entity.
  • Type - The type of the document/entity, e.g. Index, XSLT, Dashboard, etc.
  • Name - The name given to the document/entity.

DocRefs are used heavily in the REST API for identifying the document/entity to be acted on.

Document

Typically refers to an item that can be created in the Explorer Tree, e.g. a Feed, a Pipeline, a Dashboard, etc. May also be known as an Entity .

Document Permission

Document permissions control the access that users and/or groups have to a Document .

See the Document Permissions for more detail.

Elasticsearch

Elasticsearch is an Open Source and commercial search index product. Stroom can be connected to one or more Elasticsearch clusters so that event indexing and search is handled by Elasticsearch rather than internally.

Entity

Typically refers to an item that can be created in the Explorer Tree, e.g. a Feed, a Pipeline, a Dashboard, etc. May also be known as a Document .

Event

An event is a single auditable event, e.g. a user logging in to a system. A Stream typically contains multiple events.

In a Raw Stream an event is typically represented as block of XML or JSON, a single line for CSV data. In an Events Stream an event is identified by its Event ID which its position in that stream (as a one-based number). The Event ID combined with a Stream ID provide a unique identifier for an event within a Stroom instance.

Events

This is a Stream Type in Stroom. An Events stream consists of processed/cooked data that has been demarcated into individual Events. Typically in Stroom an Events stream will contain data conforming to the event-logging XML Schema which provides a normalised form for all Raw Events to be transformed into.

Explorer Tree

The left hand navigation tree. The Explorer Tree is used for finding, opening, creating, renaming, copying, moving and deleting Entities . It can also be used to control the access permissions of entities and folders. The tree can be filtered using the quick filter, see Finging Things for more details.

Expression Tree

A tree of expression terms that each evaluate to a boolean (True/False) value. Terms can be grouped together within an expression operator (AND, OR, NOT). For example:

AND (
  Feed is CSV_FEED
  Type = Raw Events
)

Expression Trees are used in Processor Filters and Query expressions.

See also Expression functions.

Feed

A Feed is means of organising and categorising data in Stroom. A Feed contains multiple Streams of data that have been ingested into Stroom or output by a Pipeline . Typically a Feed will contain Streams of data that are all from one system and have a common data format.

Field

A named data Field within some form of record or entity, and where each Field can have an associated value. In Stroom Fields can be the Fields in an Index (or other queryable Datasource ) or the fields of Meta Data associated with a Stream , e.g. Stream ID, Feed , creation time, etc.

Filter (Processor)

See Processor Filter .

FQDN

The Fully Qualified Domain Name of a device, i.e. server57.some.domain.com.

Group (Users)

A named group of users to which application and document permissions can be assigned. Users can belong to multiple groups. A Group can belong to multiple groups. Groups allow permissions to be assigned to the group such that members of that group inherit those permissions.

See Users and Groups for more detail.

Identity Provider (IDP)

An Identity Provider is a system or service that can authenticate a user and assert their identity. Identity providers can support single sign on (SSO), which allows the user to sign in once to the Identity Provider so they are then authenticated to all systems using that IDP. Examples of identity providers are Google, Cognito, KeyCloack and Microsoft Azure/Entra AD. Stroom has its own built in IDP or can be configured to use a 3rd party IDP.

JSON

JavaScript Object Notation is a file/data format for storing/transmitting structured data. It has similarities to XML, is less verbose, but is more simplistic. Stroom accepts data in JSON format and can output to JSON.

See Wikipedia for details.

Markdown

Markdown is a markup language for creating formatted text using a simple text editor. Stroom uses the Showdown markdown converter to render users’ markdown content into formatted text.

Namespace

In Stroom Namespace typically refers to an XML Namespace . Namespaces are used in XML to distinguish different elements, e.g. where an XSLT is transforming XML in the records:2 Namespace into XML in the event-logging:3 Namespace.

An XSLT will define short aliases for Namespaces to make them easier to reference within the XSLT document, e.g. in this example the aliases stroom, evt, xsl, xsi:

<xsl:stylesheet
  xmlns="event-logging:3"
  xpath-default-namespace="records:2"
  xmlns:stroom="stroom"
  xmlns:evt="event-logging:3"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  version="2.0">

Parser

A Parser is a Pipeline element for parsing Raw Events into a structured form. For example the Data Splitter Parser that parses text data into Records and Fields .

See the Pipeline Element Reference for details.

Pipeline

A Pipeline is an entity that is constructed to take a single input of stream data and process/transform it with one or more outputs. A Pipeline can have many elements within it to read, process or transform the data flowing through it.

See the User Guide for more detail.

Pipeline Element

An element within a Pipeline that performs some action on the data flowing through it.

See the Pipeline Element Reference for more detail.

Processor

A Processor belongs to a Pipeline . It controls the processing of data through its parent Pipeline. The Processor can be enabled/disabled to enable/disable the processing of data through the Pipeline. A processor will have one or more Processor Filters associated with it.

Processor Filter

A Processor Filter consists of an expression tree to select which Streams to process through its parent Pipeline . For example a typical Processor Filter would have an Expression Tree that selected all Streams of type Raw Events in a particular Feed . A filter could also select a single Stream by its ID, e.g. when Re-Processing a Stream. The filter is used to find Streams to process through the Pipeline associated with the Processor Filter. A Pipeline can have multiple Processor Filters. Filters can be enabled/disabled independently of their parent Processor to control processing.

Property

A configuration Property for configuring Stroom. Properties can be set via in the user interface or via the config.yml configuration file.

See Properties for more detail.

Query

The search Query in a Dashboard that selects the data to display. The Query is constructed using an Expression Tree of terms.

See the User Guide for more detail.

Raw Events

This is a Stream Type used for Streams received by Stroom. Streams received by Stroom will be in a variety of text formats (CSV, delimited, fixed width, XML, JSON, etc.). Until they have been processed by a pipeline they are essentially just unstructured character data with no concept of what is a record/event. A Parser in a pipeline is required to provide the demarcation between records/events.

Re-Processing

The act of repeating the processing of a set of input data ( Streams ) that have already been processed at least once. Re-Processing can be done for an individual Stream or multiple Streams using a Processor Filter .

Records

This is a Stream Type for Streams containing data conforming to the records:2 XML Schema . It also refers more generally to XML conforming to the records:2 XML Schema which is used in a number of places in Stroom, including as the output format for the DSParser and input for the IndexingFilter .

Searchable

A Searchable is the term given the special searchable data sources that appear at the root of the explorer tree picker when selecting a data source. These data sources are special internal data sources that are not user managed content, unlike an Index . They provide the means to search various aspects of Stroom’s internals, such as the Meta Store or Processor Tasks.

Search Extraction

The process of extracting un-indexed Field values from the source Event /Record to be used in search results.

See the User Guide for more detail.

Stream

A Stream is the unit of data that Stroom works with and will typically contain many Events .

See the User Guide for more detail.

Stream Type

All Streams must have an Stream Type. The list of Stream Types is configured using the Property stroom.data.meta.metaTypes. Additional Stream Types can be added however the list of Stream Types must include the following built-in types:

Some Stream Types, such as Meta and Context only exist as child streams within another Stream.

StroomQL

Stroom Query Language is Stroom’s own query language. It has similarities with Structured Query Language (SQL) as used in databases. StroomQL is sometimes referred to as sQL to distinguish it from SQL.

See Stroom Query Language

Stepper

The Stepper is a tool in Stroom for developing and debugging a Pipeline . It allows the user to simulate passing a Stream through a pipeline with the ability to step from one record/event to the next or to jump to records/events based on filter criteria. The parsers and translations can be edited while in the Stepper with the element output updating to show the effect of the change. The stepper will not write data to the file system or stream stores.

Token

Typically refers to an authentication token that may be used for user authentication or as an Api Key . Tokens are JSON Web Tokens (JWT) that set in the HTTP header Authorization with a value of the form Bearer TOKEN_GOES_HERE. Tokens are associated with a Stroom User so have the same or less permissions than that user. Tokens also have an expiry time after which they will no longer work.

Transport Layer Security (TLS)

TLS is the evolution of Secure Sockets Layer (SSL) and refers to the encryption of traffic between client and server. TLS is typically used in Stroom for communications between Stroom-Proxy and Stroom, between Stroom nodes and when communicating with external systems (e.g. an Elasticsearch cluster of a HttpPostFilter destination).

Tracker

A Tracker is associated with a Processor Filter and keeps track of the Streams that the Processor Filter has already processed.

User

Refers to a Stroom User that is linked to either an Account in Stroom’s internal Identity Provider or a user account in an external Identity Provider . A Stroom User is only concerned with authorisation (i.e. application/document permissions and group memberships), and not authentication.

See Users and Groups for more detail.

UTC

UTC (Coordinated Universal Time) , also known as Zulu time, is the international standard by which the world regulates clocks and time. It is essentially a successor to Greenwich Mean Time (GMT). UTC has the timezone offset of +00:00. All international time zones are relative to UTC.

Stroom currently works internally in UTC, though it is possible to change the display time zone via User Preferences to display times in another timezone.

UUID

A Universally Unique Identifier for uniquely identifying something. UUIDs are used as the identifier in DocRefs . An example of a UUID is 4ffeb895-53c9-40d6-bf33-3ef025401ad3.

See the User Guide for more detail.

Visualisation

A document comprising some Javascript code for visualising data, e.g. pie charts, heat maps, line graphs etc. Visualisations are not baked into Stroom, they are content, so can be created/modified/shared by Stroom users.

Volume

In Stroom a Volume is a logical storage area that Stroom can write data to. Volumes are associated with a path on a file system that can either be local to the Stroom node or on a shared file system. Stroom has two types of Volume; Index Volumes and Data Volumes.

  • Index Volume - Where the Lucene Index Shards are written to. An Index Volume must belong to a Volume Group .
  • Data Volume - Where streams are written to. When writing Stream data Stroom will pick a data volume to using a volume selector as configured by the Property stroom.data.filesystemVolume.volumeSelector.

See the User Guide for more detail.

Volume Group

A Volume Group is a collection of one or more Index Volumes. Index volumes must belong to a volume group and Indexes are configured to write to a particular Volume Group. When Stroom is write data to a Volume Group it will choose which if the Volumes in the group to write to using a volume selector as configured by the Property stroom.volumes.volumeSelector.

See the User Guide for more detail.

XML

Extensible Markup Language is a markup language for storing/transmitting structured data. It is the working format for most Pipeline processing in Stroom and is the standard normalised format for event data.

XML Schema

XML Schema is a language used to define the permitted structure of an XML document. An XML Schema can be used to validate an XML document to ensure it conforms to that schema such that onward processing of the XML document can be done with confidence that the document is correct.

The event-logging XML Schema is an example of an XML Schema.

XPath

XPath is an expression language for selecting a node or nodes in an XML document. It is used heavily in XSLT to define the match criteria for templates and to select values.

XSLT

Extensible Stylesheet Language Transformations is a language for transforming XML documents into other XML documents. XSLTs are the primary means of transforming data in Stroom. All data is converted into a basic form of XML and then XSLTs are used to decorate and transform it into a common form. XSLTs are also used to transform XML Events data into non-XML forms or XML with a different schema for indexing, statistics or for sending to other systems.

See the User Guide for more detail.

Last modified November 26, 2024: Merge branch '7.5' into 7.6 (5fb86ee)