The page that you are currently viewing is for an old version of Stroom (7.1). The documentation for the latest version of Stroom (7.6) can be found using the version drop-down at the top of the screen or by clicking here.
Element Reference
Reader
Reader elements read and transform the data at the character level before they are parsed into a structured form.
BOMRemovalFilterInput
Removes the Byte Order Mark (if present) from the stream.
BadTextXMLFilterReader
TODO - Add description
Element properties:
Name | Description | Default Value |
---|---|---|
tags | A comma separated list of XML elements between which non-escaped characters will be escaped. | - |
FindReplaceFilter
Replaces strings or regexes with new strings.
Element properties:
Name | Description | Default Value |
---|---|---|
bufferSize | The number of characters to buffer when matching the regex. | 1000 |
dotAll | Let ‘.’ match all characters in a regex. | false |
escapeFind | Whether or not to escape find pattern or text. | true |
escapeReplacement | Whether or not to escape replacement text. | true |
find | The text or regex pattern to find and replace. | - |
maxReplacements | The maximum number of times to try and replace text. There is no limit by default. | - |
regex | Whether the pattern should be treated as a literal or a regex. | false |
replacement | The replacement text. | - |
showReplacementCount | Show total replacement count | true |
InvalidCharFilterReader
TODO - Add description
Element properties:
Name | Description | Default Value |
---|---|---|
xmlVersion | XML version, e.g. 1.0 or 1.1 | 1.1 |
InvalidXMLCharFilterReader
Strips out any characters that are not within the standard XML character set.
Element properties:
Name | Description | Default Value |
---|---|---|
xmlVersion | XML version, e.g. 1.0 or 1.1 | 1.1 |
Reader
TODO - Add description
Parser
Parser elements parse raw text data that conforms to some kind of structure (e.g. XML, JSON, CSV) into XML events (elements, attributes, text, etc) that can be further validated or transformed using. The choice of Parser will be dictated by the structure of the data. Parsers read the data using the character encoding defined on the feed.
CombinedParser
The original general-purpose reader/parser that covers all source data types but provides less flexibility than the source format-specific parsers such as dsParser.
Element properties:
Name | Description | Default Value |
---|---|---|
fixInvalidChars | Fix invalid XML characters from the input stream. | false |
namePattern | A name pattern to load a text converter dynamically. | - |
suppressDocumentNotFoundWarnings | If the text converter cannot be found to match the name pattern suppress warnings. | false |
textConverter | The text converter configuration that should be used to parse the input data. | - |
type | The parser type, e.g. ‘JSON’, ‘XML’, ‘Data Splitter’. | - |
DSParser
A parser for data that uses Data Splitter code.
Element properties:
Name | Description | Default Value |
---|---|---|
namePattern | A name pattern to load a data splitter dynamically. | - |
suppressDocumentNotFoundWarnings | If the data splitter cannot be found to match the name pattern suppress warnings. | false |
textConverter | The data splitter configuration that should be used to parse the input data. | - |
JSONParser
A built-in parser for JSON source data in JSON fragment format into an XML document.
Element properties:
Name | Description | Default Value |
---|---|---|
addRootObject | Add a root map element. | true |
allowBackslashEscapingAnyCharacter | Feature that can be enabled to accept quoting of all character using backslash quoting mechanism: if not enabled, only characters that are explicitly listed by JSON specification can be thus escaped (see JSON spec for small list of these characters) | false |
allowComments | Feature that determines whether parser will allow use of Java/C++ style comments (both ‘/’+’*’ and ‘//’ varieties) within parsed content or not. | false |
allowMissingValues | Feature allows the support for “missing” values in a JSON array: missing value meaning sequence of two commas, without value in-between but only optional white space. | false |
allowNonNumericNumbers | Feature that allows parser to recognize set of “Not-a-Number” (NaN) tokens as legal floating number values (similar to how many other data formats and programming language source code allows it). | false |
allowNumericLeadingZeros | Feature that determines whether parser will allow JSON integral numbers to start with additional (ignorable) zeroes (like: 000001). | false |
allowSingleQuotes | Feature that determines whether parser will allow use of single quotes (apostrophe, character ‘'’) for quoting Strings (names and String values). If so, this is in addition to other acceptable markers but not by JSON specification). | false |
allowTrailingComma | Feature that determines whether we will allow for a single trailing comma following the final value (in an Array) or member (in an Object). These commas will simply be ignored. | false |
allowUnquotedControlChars | Feature that determines whether parser will allow JSON Strings to contain unquoted control characters (ASCII characters with value less than 32, including tab and line feed characters) or not. If feature is set false, an exception is thrown if such a character is encountered. | false |
allowUnquotedFieldNames | Feature that determines whether parser will allow use of unquoted field names (which is allowed by Javascript, but not by JSON specification). | false |
allowYamlComments | Feature that determines whether parser will allow use of YAML comments, ones starting with ‘#’ and continuing until the end of the line. This commenting style is common with scripting languages as well. | false |
XMLFragmentParser
A parser to convert multiple XML fragments into an XML document.
Element properties:
Name | Description | Default Value |
---|---|---|
namePattern | A name pattern to load a text converter dynamically. | - |
suppressDocumentNotFoundWarnings | If the text converter cannot be found to match the name pattern suppress warnings. | false |
textConverter | The XML fragment wrapper that should be used to wrap the input XML. | - |
XMLParser
TODO - Add description
Filter
Filter elements work with XML events that have been generated by a parser. They can consume the events without modifying them, e.g. RecordCountFilter or modify them in some way, e.g. XSLTFilter. Multiple filters can be used one after another with each using the output from the last as its input.
ElasticIndexingFilter
TODO - Add description
Element properties:
Name | Description | Default Value |
---|---|---|
batchSize | Maximum number of documents to index in each bulk request | 10000 |
cluster | Target Elasticsearch cluster | - |
indexBaseName | Name of the Elasticsearch index | - |
indexNameDateFieldName | Name of the field containing the DateTime value to use when determining the index date suffix |
@timestamp |
indexNameDateFormat | Format of the date to append to the index name (example: -yyyy ). If unspecified, no date is appended. |
- |
indexNameDateMaxFutureOffset | Do not append a time suffix to the index name for events occurring after the current time plus the specified offset | P1D |
indexNameDateMin | Do not append a time suffix to the index name for events occurring before this date. Date is assumed to be in UTC and of the format specified in indexNameDateMinFormat |
- |
indexNameDateMinFormat | Date format of the supplied indexNameDateMin property |
yyyy |
ingestPipeline | Name of the Elasticsearch ingest pipeline to execute when indexing | - |
purgeOnReprocess | When reprocessing a stream, first delete any documents from the index matching the stream ID | true |
refreshAfterEachBatch | Refresh the index after each batch is processed, making the indexed documents visible to searches | false |
HttpPostFilter
TODO - Add description
Element properties:
Name | Description | Default Value |
---|---|---|
receivingApiUrl | The URL of the receiving API. | - |
IdEnrichmentFilter
TODO - Add description
IndexingFilter
A filter to send source data to an index.
Element properties:
Name | Description | Default Value |
---|---|---|
index | The index to send records to. | - |
RecordCountFilter
TODO - Add description
Element properties:
Name | Description | Default Value |
---|---|---|
countRead | Is this filter counting records read or records written? | true |
RecordOutputFilter
TODO - Add description
ReferenceDataFilter
Takes XML input (conforming to the reference-data:2 schema) and loads the data into the Reference Data Store. Reference data values can be either simple strings or XML fragments.
Element properties:
Name | Description | Default Value |
---|---|---|
overrideExistingValues | Allow duplicate keys to override existing values? | true |
warnOnDuplicateKeys | Warn if there are duplicate keys found in the reference data? | false |
SafeXMLFilter
TODO - Add description
SchemaFilter
Checks the format of the source data against one of a number of XML schemas. This ensures that if non-compliant data is generated, it will be flagged as in error and will not be passed to any subsequent processing elements.
Element properties:
Name | Description | Default Value |
---|---|---|
namespaceURI | Limits the schemas that can be used to validate data to those with a matching namespace URI. | - |
schemaGroup | Limits the schemas that can be used to validate data to those with a matching schema group name. | - |
schemaLanguage | The schema language that the schema is written in. | http://www.w3.org/2001/XMLSchema |
schemaValidation | Should schema validation be performed? | true |
systemId | Limits the schemas that can be used to validate data to those with a matching system id. | - |
SearchResultOutputFilter
TODO - Add description
SolrIndexingFilter
Delivers source data to the specified index in an external Solr instance/cluster.
Element properties:
Name | Description | Default Value |
---|---|---|
batchSize | How many documents to send to the index in a single post. | 1000 |
commitWithinMs | Commit indexed documents within the specified number of milliseconds. | -1 |
index | The index to send records to. | - |
softCommit | Perform a soft commit after every batch so that docs are available for searching immediately (if using NRT replicas). | true |
SplitFilter
Splits multi-record source data into smaller groups of records prior to delivery to an XSLT. This allows the XSLT to process data more efficiently than loading a potentially huge input stream into memory.
Element properties:
Name | Description | Default Value |
---|---|---|
splitCount | The number of elements at the split depth to count before the XML is split. | 10000 |
splitDepth | The depth of XML elements to split at. | 1 |
storeLocations | Should this split filter store processing locations. | true |
StatisticsFilter
An element to allow the source data (conforming to the statistics
XML Schema) to be sent to the MySQL based statistics data store.
Element properties:
Name | Description | Default Value |
---|---|---|
statisticsDataSource | The statistics data source to record statistics against. | - |
StroomStatsFilter
An element to allow the source data (conforming to the statistics
XML Schema) to be sent to an external stroom-stats service.
Element properties:
Name | Description | Default Value |
---|---|---|
flushOnSend | At the end of the stream, wait for acknowledgement from the Kafka broker for all the messages sent. This ensures errors are caught in the pipeline process. | true |
kafkaConfig | The Kafka config to use. | - |
statisticsDataSource | The stroom-stats data source to record statistics against. | - |
XPathExtractionOutputFilter
TODO - Add description
Element properties:
Name | Description | Default Value |
---|---|---|
multipleValueDelimiter | The string to delimit multiple simple values. | , |
XSLTFilter
An element used to transform XML data from one form to another using XSLT. The specified XSLT can be used to transform the input XML into XML conforming to another schema or into other forms such as JSON, plain text, etc.
Element properties:
Name | Description | Default Value |
---|---|---|
pipelineReference | A list of places to load reference data from if required. | - |
suppressXSLTNotFoundWarnings | If XSLT cannot be found to match the name pattern suppress warnings. | false |
usePool | Advanced: Choose whether or not you want to use cached XSLT templates to improve performance. | true |
xslt | The XSLT to use. | - |
xsltNamePattern | A name pattern to load XSLT dynamically. | - |
Writer
Writers consume XML events (from Parsers and Filters) and convert them into a stream of bytes using the character encoding configured on the Writer (if applicable). The output data can then be fed to a Destination.
JSONWriter
Writer to convert XML data conforming to the http://www.w3.org/2013/XSL/json XML Schema into JSON format.
Element properties:
Name | Description | Default Value |
---|---|---|
encoding | The output character encoding to use. | UTF-8 |
indentOutput | Should output JSON be indented and include new lines (pretty printed)? | false |
TextWriter
Writer to convert XML character data events into plain text output.
Element properties:
Name | Description | Default Value |
---|---|---|
encoding | The output character encoding to use. | UTF-8 |
footer | Footer text that can be added to the output at the end. | - |
header | Header text that can be added to the output at the start. | - |
XMLWriter
Writer to convert XML events data into XML output in the specified character encoding.
Element properties:
Name | Description | Default Value |
---|---|---|
encoding | The output character encoding to use. | UTF-8 |
indentOutput | Should output XML be indented and include new lines (pretty printed)? | false |
suppressXSLTNotFoundWarnings | If XSLT cannot be found to match the name pattern suppress warnings. | false |
xslt | A previously saved XSLT, used to modify the output via xsl:output attributes. | - |
xsltNamePattern | A name pattern for dynamic loading of an XSLT, that will modfy the output via xsl:output attributes. | - |
Destination
Destination elements consume a stream of bytes from a Writer and persist then to a destination. This could be a file on a file system or to Stroom’s stream store.
AnnotationWriter
TODO - Add description
FileAppender
A destination used to write an output stream to a file on the file system. If multiple paths are specified in the ‘outputPaths’ property it will pick one at random to write to.
Element properties:
Name | Description | Default Value |
---|---|---|
filePermissions | Set file system permissions of finished files (example: ‘rwxr–r–’) | - |
outputPaths | One or more destination paths for output files separated with commas. Replacement variables can be used in path strings such as ${feed}. | - |
rollSize | When the current output file exceeds this size it will be closed and a new one created. | - |
splitAggregatedStreams | Choose if you want to split aggregated streams into separate output files. | false |
splitRecords | Choose if you want to split individual records into separate output files. | false |
useCompression | Apply GZIP compression to output files | false |
HDFSFileAppender
A destination used to write an output stream to a file on a Hadoop Distributed File System. If multiple paths are specified in the ‘outputPaths’ property it will pick one at random.
Element properties:
Name | Description | Default Value |
---|---|---|
fileSystemUri | URI for the Hadoop Distributed File System (HDFS) to connect to, e.g. hdfs://mynamenode.mydomain.com:8020 | - |
outputPaths | One or more destination paths for output files separated with commas. Replacement variables can be used in path strings such as ${feed}. | - |
rollSize | When the current output file exceeds this size it will be closed and a new one created. | - |
runAsUser | The user to connect to HDFS as | - |
splitAggregatedStreams | Choose if you want to split aggregated streams into separate output files. | false |
splitRecords | Choose if you want to split individual records into separate output files. | false |
HTTPAppender
A destination used to write an output stream to a remote HTTP(s) server.
Element properties:
Name | Description | Default Value |
---|---|---|
connectionTimeout | How long to wait before we abort sending data due to connection timeout | - |
contentType | The content type | application/json |
forwardChunkSize | Should data be sent in chunks and if so how big should the chunks be | - |
forwardUrl | The URL to send data to | - |
hostnameVerificationEnabled | Verify host names | true |
httpHeadersIncludeStreamMetaData | Provide stream metadata as HTTP headers | true |
httpHeadersUserDefinedHeader1 | Additional HTTP Header 1, format is ‘HeaderName: HeaderValue’ | - |
httpHeadersUserDefinedHeader2 | Additional HTTP Header 2, format is ‘HeaderName: HeaderValue’ | - |
httpHeadersUserDefinedHeader3 | Additional HTTP Header 3, format is ‘HeaderName: HeaderValue’ | - |
keyStorePassword | The key store password | - |
keyStorePath | The key store file path on the server | - |
keyStoreType | The key store type | JKS |
logMetaKeys | Which meta data values will be logged in the send log | guid,feed,system,environment,remotehost,remoteaddress |
readTimeout | How long to wait for data to be available before closing the connection | - |
requestMethod | The request method, e.g. POST | POST |
rollSize | When the current output exceeds this size it will be closed and a new one created. | - |
splitAggregatedStreams | Choose if you want to split aggregated streams into separate output. | false |
splitRecords | Choose if you want to split individual records into separate output. | false |
sslProtocol | The SSL protocol to use | TLSv1.2 |
trustStorePassword | The trust store password | - |
trustStorePath | The trust store file path on the server | - |
trustStoreType | The trust store type | JKS |
useCompression | Should data be compressed when sending | true |
useJvmSslConfig | Use JVM SSL config. Set this to true if the Stroom node has been configured with key/trust stores using java system properties like ‘javax.net.ssl.keyStore’.Set this to false if you are explicitly setting key/trust store properties on this HttpAppender. | true |
RollingFileAppender
A destination used to write an output stream to a file on the file system.
If multiple paths are specified in the ‘outputPaths’ property it will pick one at random to write to.
This is distinct from the FileAppender in that when the rollSize
is reached it will move the current file to the path specified in rolledFileName
and resume writing to the original path.
This allows other processes to follow the changes to a single file path, e.g. when using tail
.
Element properties:
Name | Description | Default Value |
---|---|---|
fileName | Choose the name of the file to write. | - |
filePermissions | Set file system permissions of finished files (example: ‘rwxr–r–’) | - |
frequency | Choose how frequently files are rolled. | 1h |
outputPaths | One or more destination paths for output files separated with commas. Replacement variables can be used in path strings such as ${feed}. | - |
rollSize | When the current output file exceeds this size it will be closed and a new one created, e.g. 10M, 1G. | 100M |
rolledFileName | Choose the name that files will be renamed to when they are rolled. | - |
schedule | Provide a cron expression to determine when files are rolled. | - |
useCompression | Apply GZIP compression to output files | false |
RollingStreamAppender
A destination used to write one or more output streams to a new stream which is then rolled when it reaches a certain size or age. A new stream will be created after the size or age criteria has been met.
Element properties:
Name | Description | Default Value |
---|---|---|
feed | The feed that output stream should be written to. If not specified the feed the input stream belongs to will be used. | - |
frequency | Choose how frequently streams are rolled. | 1h |
rollSize | Choose the maximum size that a stream can be before it is rolled. | 100M |
schedule | Provide a cron expression to determine when streams are rolled. | - |
segmentOutput | Should the output stream be marked with indexed segments to allow fast access to individual records? | true |
streamType | The stream type that the output stream should be written as. This must be specified. | - |
StandardKafkaProducer
TODO - Add description
Element properties:
Name | Description | Default Value |
---|---|---|
flushOnSend | At the end of the stream, wait for acknowledgement from the Kafka broker for all the messages sent. This ensures errors are caught in the pipeline process. | true |
kafkaConfig | Kafka configuration details relating to where and how to send Kafka messages. | - |
StreamAppender
TODO - Add description
Element properties:
Name | Description | Default Value |
---|---|---|
feed | The feed that output stream should be written to. If not specified the feed the input stream belongs to will be used. | - |
rollSize | When the current output stream exceeds this size it will be closed and a new one created. | - |
segmentOutput | Should the output stream be marked with indexed segments to allow fast access to individual records? | true |
splitAggregatedStreams | Choose if you want to split aggregated streams into separate output streams. | false |
splitRecords | Choose if you want to split individual records into separate output streams. | false |
streamType | The stream type that the output stream should be written as. This must be specified. | - |
StroomStatsAppender
TODO - Add description
Element properties:
Name | Description | Default Value |
---|---|---|
flushOnSend | At the end of the stream, wait for acknowledgement from the Kafka broker for all the messages sent. This ensures errors are caught in the pipeline process. | true |
kafkaConfig | The Kafka config to use. | - |
maxRecordCount | Choose the maximum number of records or events that a message will contain | 1 |
statisticsDataSource | The stroom-stats data source to record statistics against. | - |