Documents
All Documents in Stroom share some common elements:
- UUID - Uniquely identifies the document within Stroom and when exported into another stroom.
- Type - This is the type as used in the DocRef .
- Documentation - Every Document has a Documentation tab for recording any documentation that relates to the Document, see Documenting Content.
Some Documents are very simple with just text content and documentation, e.g. XSLT. Others are much more complex, e.g. Pipeline, with various different tabs to manage the content of the Document.
The following is a list of all Document types in Stroom.
Configuration
Documents that are used as configuration for other documents.
Dictionary
- Icon:
- Type:
Dictionary
A Dictionary is essentially a list of ‘words’, where each ‘word’ is separated by a new line.
Dictionaries can be used in filter expressions, i.e. IN DICTIONARY
.
They allow for the reuse of the same set of values across many search expressions.
Dictionaries also support inheritance so one dictionary can import the contents of other dictionaries.
Documentation
- Icon:
- Type:
Documentation
A Document type for simply storing user created documentation, e.g. adding a Documentation document into a folder to describe the contents of that folder.
Elastic Cluster
- Icon:
- Type:
ElasticCluster
Defines the connection details for a single Elasticsearch cluster. This Elastic Cluster Document can then be used by one or more Elastic Index Documents.
Kafka Configuration
- Icon:
- Type:
KafkaConfig
Defines the connection details for a single Kafka cluster. This Kafka Configuration Document can then be used by one or more StandardKafkaProducer pipeline elements.
S3 Configuration
- Icon:
- Type:
S3Config
Defines the config for S3
Script
- Icon:
- Type:
Script
Contains a Javascript script that is used as the source for a visualisation Document. Scripts can have dependencies on other Script Documents, e.g. to allow re-use of common code.
ScyllaDB
- Icon:
- Type:
ScyllaDB
Defines the connection details for a ScyllaDB state store instance.
Visualisation
- Icon:
- Type:
Visualisation
Defines a data visualisation that can be used in a Dashboard Document. The Visualisation defines the settings that will be available to the user when it is embedded in a Dashboard. A Visualisation is dependent on a Script Document for the Javascript code to make it work.
Data Processing
Documents relating to the processing of data.
Feed
- Icon:
- Type:
Feed
The Feed is Stroom’s way of compartmentalising data that has been ingested or created by a Pipeline. Ingested data must specify the Feed that is it destined for.
The Feed Document defines the character encoding for the data in the Feed, the type of data that will be received into it (e.g. Raw Events
) and optionally a Volume Group to use for data storage.
The Feed Document can also control the ingest of data using its Feed Status
property and be used for viewing data that belonging to that feed.
Pipeline
- Icon:
- Type:
Pipeline
A Pipeline defines a chain of Pipeline elements that consumes from a source of data (a Stream of raw data or cooked events) then processes it according to the elements used in the chain. Pipelines can be linear or branching and support inheritance of other pipelines to allow re-use of common structural parts.
The Pipeline Document defines the structure of the pipeline and the configuration of each of the elements in that pipeline. It also defines the filter(s) that will be used to control what data is passed through the pipeline and the priority of processing. The Pipeline Document can be used to view the data produced by the pipeline and to monitor its processing state and progress.
See Also
PipelinesIndexing
Documents relating to the process of adding data into an index, such as Lucene or Elasticsearch.
Elastic Index
- Icon:
- Type:
ElasticIndex
Defines an index that exists within an Elasticsearch cluster. This Document is used in the configuration of the ElasticIndexingFilter pipeline element.
See Also
ElasticsearchLucene Index
- Icon:
- Type:
Index
Lucene Index is the standard built-in index within Stroom and is one of may data sources. An index is like a catalog in a library and provides a very fast way to access documents/records/events when searching using fields that have been indexed. The index stores the field values and pointers to the document they came from (the Stream and Event IDs). Data can be indexed using multiple indexes to allow fast access in different ways.
The Lucene Index Document optionally defines the fields that will be indexed (it is possible to define the fields dynamically) and their types. It also allows for configuration of the way the data in the index will be stored, partitioned and retained.
The Lucene Index Document is used by the IndexingFilter and DynamicIndexingFilter pipeline elements.
See Also
Lucene IndexesSolr Index
- Icon:
- Type:
SolrIndex
Solr Index represents an index on a Solr cluster. It defines the connection details for connecting to that cluster and the structure of the index. It is used by the SolrIndexingFilter pipeline element.
See Also
Solr IntegrationState Store
- Icon:
- Type:
StateStore
Defines a place to store state
Statistic Store
- Icon:
- Type:
StatisticStore
Defines a logical statistic store used to hold statistical data of a particular type and aggregation window. Statistics in Stroom is a way to capture counts or values from events and record how they change over time, with the counts/values aggregated (sum/mean) across time windows.
The Statistic Store Document configures the type of the statistic (Count or Value), the tags that are used to qualify a statistic event and the size of the aggregation windows.
It also supports the definition of roll-ups that allow for aggregation over all values of a tag.
Tags can be things like user
, node
, feed
, etc. and can be used to filter data when querying the statistic store in a Dashboard/Query.
It is used by the StatisticsFilter pipeline element.
Stroom-Stats Store
- Icon:
- Type:
StroomStatsStore
The Stroom-Stats Store Document is deprecated and should not be used.
Search
Documents relating to searching for data in Stroom.
Analytic Rule
- Icon:
- Type:
AnalyticRule
Defines an analytic rule which can be run to alert on events meeting a criteria. The criteria is defined using a StroomQL query. The analytic can be processed in different ways:
- Streaming
- Table Builder
- Scheduled Query
Dashboard
- Icon:
- Type:
Dashboard
The Dashboard Document defines a data querying and visualisation dashboard. The dashboard is highly customisable to allow querying of many different data sources of different types. Queried data can be displayed in tabular form, visualised using interactive charts/graphs or render as HTML.
The Dashboard Doc can either be used for ad-hoc querying/visualising of data, to construct a dashboard for others to view or to just view an already constructed dashboard. Dashboards can be parameterised so that all queries on the dashboard are displaying data for the same user, for example. For ad-hoc querying of data from one data source, you are recommended to use a Query instead.
Query
- Icon:
- Type:
Query
A Query Document defines a StroomQL query and is used to execute that query and view its results. A Query can query main types of data source including Views, Lucene Indexes, and Searchables .
View
- Icon:
- Type:
View
A view is an abstraction over a data source (such as a Lucene Indexe) and optionally an extraction pipeline. Views provide a much simpler way for users to query data as the user can simply query against the View without any knowledge of the underlying data source or extraction of that data.
Transformation
Documents relating to the transformation of data.
Text Converter
- Icon:
- Type:
TextConverter
A Text Converter Document defines the specification for splitting text data into records/fields using Data Splitter or for wrapping fragment XML with a
XMLFragmentParser
pipeline element.
The content of the Document is either XML in the data-splitter:3
namespace or a fragment parser specification (see Pipeline Recipies).
This Document is used by the following pipeline elements:
XML Schema
- Icon:
- Type:
XMLSchema
This Document defines an XML Schema that can be used within Stroom for validation of XML documents. The XML Schema Document content is the XMLSchema text. This Document also defines the following:
- Namespace URI - The XML namespace of the XMLSchema and the XML document that the schema will validate.
- System Id - An ID (that is unique in Stroom) that can be used in the
xsi:schemaLocation
attribute, e.g.xsi:schemaLocation="event-logging:3 file://event-logging-v3.4.2.xsd"
. - Schema Group - A name to group multiple versions of the same schema. The SchemaFilter can be configured to only use schemas matching a configured group.
The XML Schema Document also provides a handy interactive viewer for viewing and navigating the XMLSchema in a graphical representation.
This Document is used by the SchemaFilter pipeline element.
XSL Translation
- Icon:
- Type:
XSLT
The content of this Document is an XSLT document for transforming data in a pipeline. This Document is used by the XSLTFilter pipeline element.