This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Data Sources

Stroom has multiple different types of data sources that can be queried by Stroom using Dashboards, Queries and Analytic Rules.

1: Lucene Index Data Source
2: Statistics
3: Elasticsearch
4: Internal Data Sources

1 - Lucene Index Data Source

Stroom’s own Lucene based index for indexing and searching its stream data.

Stroom’s primary data source is its internal Lucene based search indexes. For details of how data is indexed see Lucene Indexes.

TODO

Complete this section

2 - Statistics

Using Stroom’s statistic stores as a data source.

TODO

Complete this section

3 - Elasticsearch

Using Elasticsearch as a data source.

Stroom can integrate with external Elasticsearch indexes to allow querying using Stroom’s various mechanisms for querying data sources. These indexes may have been populated using a Stroom pipeline (See here).

Searching using a Stroom dashboard

Searching an Elasticsearch index (or data stream) using a Stroom dashboard is conceptually similar to the process described in Dashboards.

Before you set the dashboard’s data source, you must first create an Elastic Index document to tell Stroom which index (or indices) you wish to query.

Create an Elastic Index document

Right-click a folder in the Stroom Explorer pane ( ).
Select:

New

Elastic Index
Enter a name for the index document and click OK .
Click next to the Cluster configuration field label.
In the dialog that appears, select the Elastic Cluster document where the index exists, and click OK .
Enter the name of an index or data stream in Index name or pattern. Data view (formerly known as index pattern) syntax is supported, which enables you to query multiple indices or data streams at once. For example: stroom-events-v1.
(Optional) Set Search slices, which is the number of parallel workers that will query the index. For very large indices, increasing this value up to and including the number of shards can increase scroll performance, which will allow you to download results faster.
(Optional) Set Search scroll size, which specifies the number of documents to return in each search response. Greater values generally increase efficiency. By default, Elasticsearch limits this number to 10,000.
Click Test Connection. A dialog will appear with the result, which will state Connection Success if the connection was successful and the index pattern matched one or more indices.
Click .

Set the Elastic Index document as the dashboard data source

Open or create a dashboard.
Click in the Query panel.
Click next to the Data Source field label.
Select the Elastic Index document you created and click OK .
Configure the query expression as explained in Dashboards. Note the tips for particular Elasticsearch field mapping data types.
Configure the table.

Query expression tips

Certain Elasticsearch field mapping types support special syntax when used in a Stroom dashboard query expression.

To identify the field mapping type for a particular field:

Click in the Query panel to add a new expression item.
Select the Elasticsearch field name in the drop-down list.
Note the blue data type indicator to the far right of the row. Common examples are: keyword, text and number.

After you identify the field mapping type, move the mouse cursor over the mapping type indicator. A tooltip appears, explaining various types of queries you can perform against that particular field’s type.

Searching multiple indices

Using data view (index pattern) syntax, you can create powerful dashboards that query multiple indices at a time. An example of this is where you have multiple indices covering different types of email systems. Let’s assume these indices are named: stroom-exchange-v1, stroom-domino-v1 and stroom-mailu-v1.

There is a common set of fields across all three indices: @timestamp, Subject, Sender and Recipient. You want to allow search across all indices at once, in effect creating a unified email dashboard.

You can achieve this by creating an Elastic Index document called (for example) Elastic-Email-Combined and setting the property Index name or pattern to: stroom-exchange-v1,stroom-domino-v1,stroom-mailu-v1. Click and re-open the dashboard. You’ll notice that the available fields are a union of the fields across all three indices. You can now search by any of these - in particular, the fields common to all three.

4 - Internal Data Sources

A set of data sources for querying the inner workings of Stroom.

Stroom provides a number of built in data sources for querying the inner workings of stroom. These data sources do not have a corresponding Document so do not feature in the explorer tree.

These data sources appear as children of the root folder when selecting a data source in a Dashboard , View . They are also available in the list of data sources when editing a Query .

Analytics

TODO

Complete

Annotations

Annotations are a means of annotating search results with additional information and for assigning those annotations to users. The Annotations data source allows you to query the annotations that have been created.

Field	Type	Description
`annotaion:Id`	Long	Annotation unique identifier.
`annotation:CreatedOn`	Date	Date created.
`annotation:CreatedBy`	String	Username of the user that created the annotation.
`annotation:UpdatedOn`	Date	Date last updated.
`annotation:UpdatedBy`	String	Username of the user that last updated the annotation.
`annotation:Title`	String
`annotation:Subject`	String
`annotation:AssignedTo`	String	Username the annotation is assigned to.
`annotation:Comment`	String	Any comments on the annotation.
`annotation:History`	String	History of changes to the annotation.

Dual

The Dual data source is one with a single field that always returns one row with the same value. This data source can be useful for testing expression functions. It can also be useful when combined with an extraction pipeline that uses the stroom:http-call() XSLT function in order to make a single HTTP call using Dashboard parameter values.

Field	Type	Description
`Dummy`	String	Always one row that has the value `X`

Index Shards

Exposes the details of the index shards that make up Stroom’s Lucene based index. Each index is split up into one or more partitions and each partition is further divided into one or more shards. Each row represents one index shard.

Field	Type	Description
`Node`	String	The name of the node that the index belongs to.
`Index`	String	The name of the index document.
`Index Name`	String	The name of the index document.
`Volume Path`	String	The file path for the index shard.
`Volume Group`	String	The name of the volume group the index is using.
`Partition`	String	The name of the partition that the shard is in.
`Doc Count`	Integer	The number of documents in the shard.
`File Size`	Long	The size of the shard on disk in bytes.
`Status`	String	The status of the shard (`Closed`, `Open`, `Closing`, `Opening`, `New`, `Deleted`, `Corrupt`).
`Last Commit`	Date	The time and date of the last commit to the shard.

Meta Store

Exposes details of the streams held in Stroom’s stream (aka meta) store. Each row represents one stream.

Field	Type	Description
`Feed`	String	The name of the feed the stream belongs to.
`Pipeline`	String	The name of the pipeline that created the stream. [Optional]
`Pipeline Name`	String	The name of the pipeline that created the stream. [Optional]
`Status`	String	The status of the stream (`Unlocked`, `Locked`, `Deleted`).
`Type`	String	The Stream Type , e.g. `Events`, `Raw Events`, etc.
`Id`	Long	The unique ID (within this Stroom cluster) for the stream .
`Parent Id`	Long	The unique ID (within this Stroom cluster) for the parent stream, e.g. the Raw stream that spawned an Events stream. [Optional]
`Processor Id`	Long	The unique ID (within this Stroom cluster) for the processor that produced this stream. [Optional]
`Processor Filter Id`	Long	The unique ID (within this Stroom cluster) for the processor filter that produced this stream. [Optional]
`Processor Task Id`	Long	The unique ID (within this Stroom cluster) for the processor task that produced this stream. [Optional]
`Create Time`	Date	The time the stream was created.
`Effective Time`	Date	The time that the data in this stream is effective for. This is only used for reference data stream and is the time that the snapshot of reference data was captured. [Optional]
`Status Time`	Date	The time that the status was last changed.
`Duration`	Long	The time it took to process the stream in milliseconds. [Optional]
`Read Count`	Long	The number of records read in segmented streams. [Optional]
`Write Count`	Long	The number of records written in segmented streams. [Optional]
`Info Count`	Long	The number of INFO messages.
`Warning Count`	Long	The number of WARNING messages.
`Error Count`	Long	The number of ERROR messages.
`Fatal Error Count`	Long	The number of FATAL_ERROR messages.
`File Size`	Long	The compressed size of the stream on disk in bytes.
`Raw Size`	Long	The un-compressed size of the stream on disk in bytes.

Processor Tasks

Exposes details of the tasks spawned by the processor filters. Each row represents one processor task.

Field	Type	Description
`Create Time`	Date	The time the task was created.
`Create Time Ms`	Long	The time the task was created (milliseconds).
`Start Time`	Date	The time the task was executed.
`Start Time Ms`	Long	The time the task was executed (milliseconds).
`End Time`	Date	The time the task finished.
`End Time Ms`	Long	The time the task finished (milliseconds).
`Status Time`	Date	The time the status of the task was last updated.
`Status Time Ms`	Long	The time the status of the task was last updated (milliseconds).
`Meta Id`	Long	The unique ID (unique within this Stroom cluster) of the stream the task was for.
`Node`	String	The name of the node that the task was executed on.
`Pipeline`	String	The name of the pipeline that spawned the task.
`Pipeline Name`	String	The name of the pipeline that spawned the task.
`Processor Filter Id`	Long	The ID of the processor filter that spawned the task.
`Processor Filter Priority`	Integer	The priority of the processor filter when the task was executed.
`Processor Id`	Long	The unique ID (unique within this Stroom cluster) of the pipeline processor that spawned this task.
`Feed`	String
`Status`	String	The status of the task (`Created`, `Queued`, `Processing`, `Complete`, `Failed`, `Deleted`).
`Task Id`	Long	The unique ID (unique within this Stroom cluster) of this task.

Reference Data Store

Warning

This data source is for advanced users only and is primarily aimed at debugging issues with reference data.

Reference data is written to a persistent cache on storage local to the node. This data source exposes the data held in the store on the local node only. Given that most Stroom deployments are clustered and the UI nodes are typically not doing processing, this means the UI node will have no reference data.

Task Manager

This data source exposed the back ground tasks currently running across the Stroom cluster. Each row represents a single background server task.

Requires the Manage Tasks application permission.

Field	Type	Description
`Node`	String	The name of the node that the task is running on.
`Name`	String	The name of the task.
`User`	String	The user name of the user that the task is running as.
`Submit Time`	Date	The time the task was submitted.
`Age`	Duration	The time the task has been running for.
`Info`	String	The latest information message from the task.