This is the multi-page printable view of this section. Click here to print.
Data Sources
1 - Lucene Index Data Source
Stroom’s primary data source is its internal Lucene based search indexes. For details of how data is indexed see Lucene Indexes.
TODO
Complete this section2 - Statistics
TODO
Complete this section3 - Elasticsearch
Stroom can integrate with external Elasticsearch indexes to allow querying using Stroom’s various mechanisms for querying data sources. These indexes may have been populated using a Stroom pipeline (See here).
Searching using a Stroom dashboard
Searching an Elasticsearch index (or data stream) using a Stroom dashboard is conceptually similar to the process described in Dashboards.
Before you set the dashboard’s data source, you must first create an Elastic Index document to tell Stroom which index (or indices) you wish to query.
Create an Elastic Index document
- Right-click a folder in the Stroom Explorer pane ( ).
- Select:
- Enter a name for the index document and click .
- Click
Cluster configuration
field label.
next to the - In the dialog that appears, select the Elastic Cluster document where the index exists, and click .
- Enter the name of an index or data stream in
Index name or pattern
. Data view (formerly known as index pattern) syntax is supported, which enables you to query multiple indices or data streams at once. For example:stroom-events-v1
. - (Optional) Set
Search slices
, which is the number of parallel workers that will query the index. For very large indices, increasing this value up to and including the number of shards can increase scroll performance, which will allow you to download results faster. - (Optional) Set
Search scroll size
, which specifies the number of documents to return in each search response. Greater values generally increase efficiency. By default, Elasticsearch limits this number to10,000
. - Click
Test Connection
. A dialog will appear with the result, which will stateConnection Success
if the connection was successful and the index pattern matched one or more indices. - Click .
Set the Elastic Index document as the dashboard data source
- Open or create a dashboard.
- Click
Query
panel.
in the - Click
Data Source
field label.
next to the - Select the Elastic Index document you created and click .
- Configure the query expression as explained in Dashboards. Note the tips for particular Elasticsearch field mapping data types.
- Configure the table.
Query expression tips
Certain Elasticsearch field mapping types support special syntax when used in a Stroom dashboard query expression.
To identify the field mapping type for a particular field:
- Click
Query
panel to add a new expression item.
in the - Select the Elasticsearch field name in the drop-down list.
- Note the blue data type indicator to the far right of the row.
Common examples are:
keyword
,text
andnumber
.
After you identify the field mapping type, move the mouse cursor over the mapping type indicator. A tooltip appears, explaining various types of queries you can perform against that particular field’s type.
Searching multiple indices
Using data view (index pattern) syntax, you can create powerful dashboards that query multiple indices at a time.
An example of this is where you have multiple indices covering different types of email systems.
Let’s assume these indices are named: stroom-exchange-v1
, stroom-domino-v1
and stroom-mailu-v1
.
There is a common set of fields across all three indices: @timestamp
, Subject
, Sender
and Recipient
.
You want to allow search across all indices at once, in effect creating a unified email dashboard.
You can achieve this by creating an Elastic Index document called (for example) Elastic-Email-Combined
and setting the property Index name or pattern
to: stroom-exchange-v1,stroom-domino-v1,stroom-mailu-v1
.
Click and re-open the dashboard.
You’ll notice that the available fields are a union of the fields across all three indices.
You can now search by any of these - in particular, the fields common to all three.
4 - Internal Data Sources
Stroom provides a number of built in data sources for querying the inner workings of stroom. These data sources do not have a corresponding Document so do not feature in the explorer tree.
These data sources appear as children of the root folder when selecting a data source in a Dashboard
, View . They are also available in the list of data sources when editing a Query .Analytics
TODO
CompleteAnnotations
Annotations are a means of annotating search results with additional information and for assigning those annotations to users. The Annotations data source allows you to query the annotations that have been created.
Field | Type | Description |
---|---|---|
annotaion:Id |
Long | Annotation unique identifier. |
annotation:CreatedOn |
Date | Date created. |
annotation:CreatedBy |
String | Username of the user that created the annotation. |
annotation:UpdatedOn |
Date | Date last updated. |
annotation:UpdatedBy |
String | Username of the user that last updated the annotation. |
annotation:Title |
String | |
annotation:Subject |
String | |
annotation:AssignedTo |
String | Username the annotation is assigned to. |
annotation:Comment |
String | Any comments on the annotation. |
annotation:History |
String | History of changes to the annotation. |
Dual
The Dual data source is one with a single field that always returns one row with the same value.
This data source can be useful for testing expression functions.
It can also be useful when combined with an extraction pipeline that uses the stroom:http-call()
XSLT function in order to make a single HTTP call using Dashboard parameter values.
Field | Type | Description |
---|---|---|
Dummy |
String | Always one row that has the value X |
Index Shards
Exposes the details of the index shards that make up Stroom’s Lucene based index. Each index is split up into one or more partitions and each partition is further divided into one or more shards. Each row represents one index shard.
Field | Type | Description |
---|---|---|
Node |
String | The name of the node that the index belongs to. |
Index |
String | The name of the index document. |
Index Name |
String | The name of the index document. |
Volume Path |
String | The file path for the index shard. |
Volume Group |
String | The name of the volume group the index is using. |
Partition |
String | The name of the partition that the shard is in. |
Doc Count |
Integer | The number of documents in the shard. |
File Size |
Long | The size of the shard on disk in bytes. |
Status |
String | The status of the shard (Closed , Open , Closing , Opening , New , Deleted , Corrupt ). |
Last Commit |
Date | The time and date of the last commit to the shard. |
Meta Store
Exposes details of the streams held in Stroom’s stream (aka meta) store. Each row represents one stream.
Field | Type | Description |
---|---|---|
Feed |
String | The name of the feed the stream belongs to. |
Pipeline |
String | The name of the pipeline that created the stream. [Optional] |
Pipeline Name |
String | The name of the pipeline that created the stream. [Optional] |
Status |
String | The status of the stream (Unlocked , Locked , Deleted ). |
Type |
String | The
Stream Type
, e.g. Events , Raw Events , etc. |
Id |
Long | The unique ID (within this Stroom cluster) for the stream . |
Parent Id |
Long | The unique ID (within this Stroom cluster) for the parent stream, e.g. the Raw stream that spawned an Events stream. [Optional] |
Processor Id |
Long | The unique ID (within this Stroom cluster) for the processor that produced this stream. [Optional] |
Processor Filter Id |
Long | The unique ID (within this Stroom cluster) for the processor filter that produced this stream. [Optional] |
Processor Task Id |
Long | The unique ID (within this Stroom cluster) for the processor task that produced this stream. [Optional] |
Create Time |
Date | The time the stream was created. |
Effective Time |
Date | The time that the data in this stream is effective for. This is only used for reference data stream and is the time that the snapshot of reference data was captured. [Optional] |
Status Time |
Date | The time that the status was last changed. |
Duration |
Long | The time it took to process the stream in milliseconds. [Optional] |
Read Count |
Long | The number of records read in segmented streams. [Optional] |
Write Count |
Long | The number of records written in segmented streams. [Optional] |
Info Count |
Long | The number of INFO messages. |
Warning Count |
Long | The number of WARNING messages. |
Error Count |
Long | The number of ERROR messages. |
Fatal Error Count |
Long | The number of FATAL_ERROR messages. |
File Size |
Long | The compressed size of the stream on disk in bytes. |
Raw Size |
Long | The un-compressed size of the stream on disk in bytes. |
Processor Tasks
Exposes details of the tasks spawned by the processor filters. Each row represents one processor task.
Field | Type | Description |
---|---|---|
Create Time |
Date | The time the task was created. |
Create Time Ms |
Long | The time the task was created (milliseconds). |
Start Time |
Date | The time the task was executed. |
Start Time Ms |
Long | The time the task was executed (milliseconds). |
End Time |
Date | The time the task finished. |
End Time Ms |
Long | The time the task finished (milliseconds). |
Status Time |
Date | The time the status of the task was last updated. |
Status Time Ms |
Long | The time the status of the task was last updated (milliseconds). |
Meta Id |
Long | The unique ID (unique within this Stroom cluster) of the stream the task was for. |
Node |
String | The name of the node that the task was executed on. |
Pipeline |
String | The name of the pipeline that spawned the task. |
Pipeline Name |
String | The name of the pipeline that spawned the task. |
Processor Filter Id |
Long | The ID of the processor filter that spawned the task. |
Processor Filter Priority |
Integer | The priority of the processor filter when the task was executed. |
Processor Id |
Long | The unique ID (unique within this Stroom cluster) of the pipeline processor that spawned this task. |
Feed |
String | |
Status |
String | The status of the task (Created , Queued , Processing , Complete , Failed , Deleted ). |
Task Id |
Long | The unique ID (unique within this Stroom cluster) of this task. |
Reference Data Store
Warning
This data source is for advanced users only and is primarily aimed at debugging issues with reference data.Reference data is written to a persistent cache on storage local to the node. This data source exposes the data held in the store on the local node only. Given that most Stroom deployments are clustered and the UI nodes are typically not doing processing, this means the UI node will have no reference data.
Task Manager
This data source exposed the back ground tasks currently running across the Stroom cluster. Each row represents a single background server task.
Requires the Manage Tasks
application permission.
Field | Type | Description |
---|---|---|
Node |
String | The name of the node that the task is running on. |
Name |
String | The name of the task. |
User |
String | The user name of the user that the task is running as. |
Submit Time |
Date | The time the task was submitted. |
Age |
Duration | The time the task has been running for. |
Info |
String | The latest information message from the task. |