This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data Sources

Stroom has multiple different types of data sources that can be queried by Stroom using Dashboards, Queries and Analytic Rules.

1 - Lucene Index Data Source

Stroom’s own Lucene based index for indexing and searching its stream data.

Stroom’s primary data source is its internal Lucene based search indexes. For details of how data is indexed see Lucene Indexes.

2 - Statistics

Using Stroom’s statistic stores as a data source.

3 - Elasticsearch

Using Elasticsearch as a data source.

Stroom can integrate with external Elasticsearch indexes to allow querying using Stroom’s various mechanisms for querying data sources. These indexes may have been populated using a Stroom pipeline (See here).

Searching using a Stroom dashboard

Searching an Elasticsearch index (or data stream) using a Stroom dashboard is conceptually similar to the process described in Dashboards.

Before you set the dashboard’s data source, you must first create an Elastic Index document to tell Stroom which index (or indices) you wish to query.

Create an Elastic Index document

  1. Right-click a folder in the Stroom Explorer pane ( ).
  2. Select:
    New
    Elastic Index
  3. Enter a name for the index document and click OK .
  4. Click next to the Cluster configuration field label.
  5. In the dialog that appears, select the Elastic Cluster document where the index exists, and click OK .
  6. Enter the name of an index or data stream in Index name or pattern. Data view (formerly known as index pattern) syntax is supported, which enables you to query multiple indices or data streams at once. For example: stroom-events-v1.
  7. (Optional) Set Search slices, which is the number of parallel workers that will query the index. For very large indices, increasing this value up to and including the number of shards can increase scroll performance, which will allow you to download results faster.
  8. (Optional) Set Search scroll size, which specifies the number of documents to return in each search response. Greater values generally increase efficiency. By default, Elasticsearch limits this number to 10,000.
  9. Click Test Connection. A dialog will appear with the result, which will state Connection Success if the connection was successful and the index pattern matched one or more indices.
  10. Click .

Set the Elastic Index document as the dashboard data source

  1. Open or create a dashboard.
  2. Click in the Query panel.
  3. Click next to the Data Source field label.
  4. Select the Elastic Index document you created and click OK .
  5. Configure the query expression as explained in Dashboards. Note the tips for particular Elasticsearch field mapping data types.
  6. Configure the table.

Query expression tips

Certain Elasticsearch field mapping types support special syntax when used in a Stroom dashboard query expression.

To identify the field mapping type for a particular field:

  1. Click in the Query panel to add a new expression item.
  2. Select the Elasticsearch field name in the drop-down list.
  3. Note the blue data type indicator to the far right of the row. Common examples are: keyword, text and number.

After you identify the field mapping type, move the mouse cursor over the mapping type indicator. A tooltip appears, explaining various types of queries you can perform against that particular field’s type.

Searching multiple indices

Using data view (index pattern) syntax, you can create powerful dashboards that query multiple indices at a time. An example of this is where you have multiple indices covering different types of email systems. Let’s assume these indices are named: stroom-exchange-v1, stroom-domino-v1 and stroom-mailu-v1.

There is a common set of fields across all three indices: @timestamp, Subject, Sender and Recipient. You want to allow search across all indices at once, in effect creating a unified email dashboard.

You can achieve this by creating an Elastic Index document called (for example) Elastic-Email-Combined and setting the property Index name or pattern to: stroom-exchange-v1,stroom-domino-v1,stroom-mailu-v1. Click and re-open the dashboard. You’ll notice that the available fields are a union of the fields across all three indices. You can now search by any of these - in particular, the fields common to all three.

4 - Internal Data Sources

A set of data sources for querying the inner workings of Stroom.

Stroom provides a number of built in data sources for querying the inner workings of stroom. These data sources do not have a corresponding Document so do not feature in the explorer tree.

These data sources appear as children of the root folder when selecting a data source in a Dashboard , View . They are also available in the list of data sources when editing a Query .

Analytics

Annotations

Annotations are a means of annotating search results with additional information and for assigning those annotations to users. The Annotations data source allows you to query the annotations that have been created.

Field Type Description
annotaion:Id Long Annotation unique identifier.
annotation:CreatedOn Date Date created.
annotation:CreatedBy String Username of the user that created the annotation.
annotation:UpdatedOn Date Date last updated.
annotation:UpdatedBy String Username of the user that last updated the annotation.
annotation:Title String
annotation:Subject String
annotation:AssignedTo String Username the annotation is assigned to.
annotation:Comment String Any comments on the annotation.
annotation:History String History of changes to the annotation.

Dual

The Dual data source is one with a single field that always returns one row with the same value. This data source can be useful for testing expression functions. It can also be useful when combined with an extraction pipeline that uses the stroom:http-call() XSLT function in order to make a single HTTP call using Dashboard parameter values.

Field Type Description
Dummy String Always one row that has the value X

Index Shards

Exposes the details of the index shards that make up Stroom’s Lucene based index. Each index is split up into one or more partitions and each partition is further divided into one or more shards. Each row represents one index shard.

Field Type Description
Node String The name of the node that the index belongs to.
Index String The name of the index document.
Index Name String The name of the index document.
Volume Path String The file path for the index shard.
Volume Group String The name of the volume group the index is using.
Partition String The name of the partition that the shard is in.
Doc Count Integer The number of documents in the shard.
File Size Long The size of the shard on disk in bytes.
Status String The status of the shard (Closed, Open, Closing, Opening, New, Deleted, Corrupt).
Last Commit Date The time and date of the last commit to the shard.

Meta Store

Exposes details of the streams held in Stroom’s stream (aka meta) store. Each row represents one stream.

Field Type Description
Feed String The name of the feed the stream belongs to.
Pipeline String The name of the pipeline that created the stream. [Optional]
Pipeline Name String The name of the pipeline that created the stream. [Optional]
Status String The status of the stream (Unlocked, Locked, Deleted).
Type String The Stream Type , e.g. Events, Raw Events, etc.
Id Long The unique ID (within this Stroom cluster) for the stream .
Parent Id Long The unique ID (within this Stroom cluster) for the parent stream, e.g. the Raw stream that spawned an Events stream. [Optional]
Processor Id Long The unique ID (within this Stroom cluster) for the processor that produced this stream. [Optional]
Processor Filter Id Long The unique ID (within this Stroom cluster) for the processor filter that produced this stream. [Optional]
Processor Task Id Long The unique ID (within this Stroom cluster) for the processor task that produced this stream. [Optional]
Create Time Date The time the stream was created.
Effective Time Date The time that the data in this stream is effective for. This is only used for reference data stream and is the time that the snapshot of reference data was captured. [Optional]
Status Time Date The time that the status was last changed.
Duration Long The time it took to process the stream in milliseconds. [Optional]
Read Count Long The number of records read in segmented streams. [Optional]
Write Count Long The number of records written in segmented streams. [Optional]
Info Count Long The number of INFO messages.
Warning Count Long The number of WARNING messages.
Error Count Long The number of ERROR messages.
Fatal Error Count Long The number of FATAL_ERROR messages.
File Size Long The compressed size of the stream on disk in bytes.
Raw Size Long The un-compressed size of the stream on disk in bytes.

Processor Tasks

Exposes details of the tasks spawned by the processor filters. Each row represents one processor task.

Field Type Description
Create Time Date The time the task was created.
Create Time Ms Long The time the task was created (milliseconds).
Start Time Date The time the task was executed.
Start Time Ms Long The time the task was executed (milliseconds).
End Time Date The time the task finished.
End Time Ms Long The time the task finished (milliseconds).
Status Time Date The time the status of the task was last updated.
Status Time Ms Long The time the status of the task was last updated (milliseconds).
Meta Id Long The unique ID (unique within this Stroom cluster) of the stream the task was for.
Node String The name of the node that the task was executed on.
Pipeline String The name of the pipeline that spawned the task.
Pipeline Name String The name of the pipeline that spawned the task.
Processor Filter Id Long The ID of the processor filter that spawned the task.
Processor Filter Priority Integer The priority of the processor filter when the task was executed.
Processor Id Long The unique ID (unique within this Stroom cluster) of the pipeline processor that spawned this task.
Feed String
Status String The status of the task (Created, Queued, Processing, Complete, Failed, Deleted).
Task Id Long The unique ID (unique within this Stroom cluster) of this task.

Reference Data Store

Reference data is written to a persistent cache on storage local to the node. This data source exposes the data held in the store on the local node only. Given that most Stroom deployments are clustered and the UI nodes are typically not doing processing, this means the UI node will have no reference data.

Task Manager

This data source exposed the back ground tasks currently running across the Stroom cluster. Each row represents a single background server task.

Requires the Manage Tasks application permission.

Field Type Description
Node String The name of the node that the task is running on.
Name String The name of the task.
User String The user name of the user that the task is running as.
Submit Time Date The time the task was submitted.
Age Duration The time the task has been running for.
Info String The latest information message from the task.