Event Data Formats

The data formats to use when sending data to Stroom.

1: Character Encoding
2: Event XML Fragments

Stroom accepts data in many different forms as long as they are text data and are in one of the supported character encodings. The following is a non-exhaustive list of formats supported by Stroom:

Event XML fragments
Events XML
JSON
Delimited data, with and without a header row (e.g CSV, TSV, etc.)
Fixed width text data
Multi line data (where each line can be a different format), e.g. Auditd.

Preferred format

Where the system/application generating the logs is developed by you and the log format is under your control, the preferred format is Events XML or Event XML fragments. The reason for this is that all data in Stroom will be normalised into a standard form. This standard form is controlled by the event-logging XML Schema . If data is sent in Events/Event XML then it will not require any additional translation.

1 - Character Encoding

Details of the character encodings supported by Stroom.

When data is sent to Stroom the character encoding of the data should be configured for the Feed. This tells Stroom how to decode the data that has been sent. All data sent to a feed must be encoded in the character encode configured for that Feed.

Supported Character Encodings

The currently supported character encodings are:

UTF-8

This is the default character encoding A variable width character encoding consisting of one to four bytes per ‘character’. UTF-8 is supported with or without a Byte Order Mark.

UTF-16

A variable width character encoding consisting of two or four bytes per ‘character’. UTF-16 can be encoded with either Big (UTF16-BE) or Little (UTF16-LE) Endianness depending on the system that encoded it. The Byte Order Mark will specify the endianness but is optional.

UTF-32

A fixed width character encoding consisting of four bytes per ‘character’. UTF-32 can be encoded with either Big (UTF32-BE) or Little (UTF32-LE) Endianness depending on the system that encoded it. The Byte Order Mark will specify the endianness but is optional.

ASCII

A single byte character encoding supporting only 128 characters. This character encoding has very limited use as it does not support accented characters or emojis so should be avoided for any logs that capture user input where these characters may occur.

Byte Order Mark (BOM)

A Byte Order Mark (BOM) is a special Unicode character at the start of a text stream that indicates the byte order (or endianness) of the stream. It can also be used to determine the character encoding of the stream.

Stroom can handle the presence of BOMs in the stream and can use it to determine the character encoding.

Encoding	BOM
UTF8	`EF` `BB` `BF`
UTF16-LE	`FF` `FE`
UTF16-BE	`FE` `FF`
UTF32-LE	`FF` `FE` `00` `00`
UTF32-BE	`00` `00` `FE` `FF`

2 - Event XML Fragments

Description of the Event XML Fragments

This format is a file containing multiple <Event>...</Event> element blocks but without any root element, or any XML processing instruction. For example, a file may look like:

<Event>
  ...
</Event>
<Event>
  ...
</Event>
<Event>
  ...
</Event>

Each <Evemt> element is valid against the event-logging XML Schema but the file is not as it contains no root element. This is the output format used by the event-logging Java library.