This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Sending Data to Stroom

How to send data (event logs) into Stroom or one of its proxies.

Stroom and Stroom Proxy have a simple HTTP POST interface that requires HTTP header arguments to be supplied as described here.

Files are posted to Stroom and Stroom Proxy as described here.

Stroom will return a response code indicating the success or failure status of the post as described here

Data can be sent from any operating systems or applications. Some examples to aid in sending data can be found here

It is common practice for the developers/admins of a client system to write the translation to normalise their data as they’re in the best position to understand their logging and to generate specific events as required. See here for further details.

1 - Data Formats

The data formats to use when sending data to Stroom.

Stroom accepts data in many different forms as long as they are text data and are in one of the supported character encodings. The following is a non-exhaustive list of formats supported by Stroom:

  • Event XML fragments
  • Events XML
  • JSON
  • Delimited data, with and without a header row (e.g CSV, TSV, etc.)
  • Fixed width text data
  • Multi line data (where each line can be a different format), e.g. Auditd.

Preferred format

Where the system/application generating the logs is developed by you and the log format is under your control, the preferred format is Events XML or Event XML fragments. The reason for this is that all data in Stroom will be normalised into a standard form. This standard form is controlled by the event-logging XML Schema . If data is sent in Events/Event XML then it will not require any additional translation.

1.1 - Character Encoding

Details of the character encodings supported by Stroom.

When data is sent to Stroom the character encoding of the data should be configured for the Feed.

Supported Character Encodings

The currently supported character encodings are:

UTF-8

This is the default character encoding A variable width character encoding consisting of one to four bytes per ‘character’.

UTF-16

A variable width character encoding consisting of two or four bytes per ‘character’. UTF-16 can be encoded with either Big (UTF16-BE) or Little (UTF16-LE) Endianness depending on the system that encoded it. The Byte Order Mark will specify the endianness.

UTF-32

A fixed width character encoding consisting of four bytes per ‘character’. UTF-32 can be encoded with either Big (UTF32-BE) or Little (UTF32-LE) Endianness depending on the system that encoded it. The Byte Order Mark will specify the endianness.

ASCII

A single byte character encoding supporting only 128 characters. This character encoding has very limited use as it does not support accented characters or emojis so should be avoided for any logs that capture user input where these characters may occur.

Byte Order Mark (BOM)

A Byte Order Mark (BOM) is a special Unicode character at the start of a text stream that indicates the byte order (or endianness) of the stream. It can also be used to determine the character encoding of the stream.

Stroom can handle the presence of BOMs in the stream and can use it to determine the character encoding.

Encoding BOM
UTF8 EF BB BF
UTF16-LE FF FE
UTF16-BE FE FF
UTF32-LE FF FE 00 00
UTF32-BE 00 00 FE FF

1.2 - Event XML Fragments

Description of the Event XML Fragments

This format is a file containing multiple <Event>...</Event> element blocks but without any root element, or any XML processing instruction. For example, a file may look like:

<Event>
  ...
</Event>
<Event>
  ...
</Event>
<Event>
  ...
</Event>

Each <Evemt> element is valid against the event-logging XML Schema but the file is not as it contains no root element. This is the output format used by the event-logging Java library.

2 - Example Clients

A collection of example client applications for sending data to Stroom or one of its proxies.

The following article provides examples to help data providers send data to Stroom via the HTTPS interface. The code for the clients is in the stroom-clients repository stroom-clients .

2.1 - curl (Linux)

How to use the curl command to send data to Stroom.

Curl is a standard unix tool to send data to or from a server. In the following examples -H is used to specify the header arguments required by Stroom, see Header Arguments.

Notes:

  • The @ character must be used in front of the file being posted. If it is not then curl will post the file name instead of it’s contents.
  • The –data-binary argument must always be used even for text formats, in order to prevent data corruption by curl stripping out newlines.

Example HTTPS post without authentication:

curl -k --data-binary @file.dat "https://<Stroom_HOST>/stroom/datafeed" \
-H "Feed:EXAMPLE_FEED" \
-H "System:EXAMPLE_SYSTEM" \
-H "Environment:EXAMPLE_ENVIRONMENT"

In the above example -k is required to stop curl from authenticating the server. The next example must be used to supply the necessary CA to authenticate the server if this is required.

Example HTTPS With 1 way SSL authentication:

curl --cacert root_ca.crt --data-binary @file.dat "https://<Stroom_HOST>/stroom/datafeed" \
-H "Feed:EXAMPLE_FEED" \
-H "System:EXAMPLE_SYSTEM" \
-H "Environment:EXAMPLE_ENVIRONMENT"

The above example verifies that the certificate presented by Stroom is signed by the CA. The CA is provided to curl using the ‘–cacert root_ca.crt’ parameter.

For step by step instructions for creating, configuring and testing the PKI authentication, see the SSL Guide

Example HTTPS With 2 way SSL authentication:

curl --cert example.pem --cacert root_ca.crt --data-binary @file.dat "https://<Stroom_HOST>/stroom/datafeed" \
-H "Feed:EXAMPLE_FEED" \
-H "System:EXAMPLE_SYSTEM" \
-H "Environment:EXAMPLE_ENVIRONMENT"

The above example both verifies that the certificate presented by Stroom is signed by the CA and also provides a certificate to authenticate itself with Stroom. The data provider provides a certificate using the ‘–cert example.pem’ parameter.

If your input file is not compressed you should compress it as follows:

gzip -c uncompressedfile.dat \
| curl --cert example.pem --cacert root_ca.crt --data-binary @- "https://<Stroom_HOST>/stroom/datafeed" \
-H "Feed:EXAMPLE_FEED" \
-H "System:EXAMPLE_SYSTEM" \
-H "Environment:EXAMPLE_ENVIRONMENT" \
-H "Compression:Gzip"

When delivering data from a RHEL4 host, an additional header argument must be added to specify the FQDN of the host:

-H "Hostname:host.being.audited"

The hostname being sent as a header argument may be resolved upon execution using the command hostname -f.

SSL Notes

To create a .pem format key simply append the private key and certifcate.

cat <NAME>.key >> <NAME>.pem
cat <NAME>.crt >> <NAME>.pem

To remove the pass phrase from a openssl private key use.

openssl rsa -in server.key -out server-clear.key

The send-logs.sh script assumes the period start and end times are embedded in the file name (e.g. log_2010-01-01T12:00:00.000Z_2010-01-02T12:00:00.000Z.log). The certificates will need to be added to the script as above.

2.2 - curl (Windows)

Using Curl on Windows to send data to Stroom.

There is a version of curl for Windows

Windows 10 is the latest desktop OS offering from Microsoft. From Windows 10 build 17063 and later, curl is now natively included - you can execute it directly from Cmd.exe or PowerShell.exe. Curl.exe is located at c:\windows\system32 (which is included in the standard PATH environment variable) - all you need to do is run Command Prompt with administrative rights and you can use Curl. You can execute it directly from Cmd.exe or PowerShell.exe. For older versions of Windows, the cURL project has Windows binaries.

curl -s -k --data-binary @file.dat "https://stroomp.strmdev00.org/stroom/datafeed" -H"Feed:TEST-FEED-V1_0" -H"System:EXAMPLE_SYSTEM" -H"Environment:EXAMPLE_ENVIRONMENT"
images/user-guide/sending-data/curl_windows.png

Windows curl CLI

2.3 - event-logging (Java library)

A Java library for logging events in Java applications.

event-logging is a Java API for logging audit events conforming to the Event Logging XML Schema . The API uses a generated Java JAXB model of the Event Logging XML Schema. Event Logging can be incorporated into your Java application to provide a means of recording and outputting audit events or user actions for compliance, security or monitoring.

This library only generates the events. By default XML events are written to a file using a logging appender. In order to send the events to Stroom either the logged files will need to be sent to stroom using one of the other clients.

2.4 - send_to_stroom.sh (Linux)

A shell script for sending logs to Stroom or one of its proxies

send_to_stroom.sh is a small bash script to make it easier to send data to stroom. To use it download the following files using wget or similar, replacing SEND_TO_STROOM_VER with the latest released version from here :

SEND_TO_STROOM_VER="send-to-stroom-v2.0" && \
    wget "https://raw.githubusercontent.com/gchq/stroom-clients/${SEND_TO_STROOM_VER}/bash/send_to_stroom.sh" && \
    wget "https://raw.githubusercontent.com/gchq/stroom-clients/${SEND_TO_STROOM_VER}/bash/send_to_stroom_args.sh" && \
    chmod u+x send_to_stroom*.sh

To see the help for send_to_stroom.sh, enter ./send_to_stroom.sh --help

The following is an example of using send_to_stroom.sh to send all logs in a directory:

./send_to_stroom.sh \
    --delete-after-sending \
    --file-regex ".*/access-[0-9]+.*\.log(\.gz)?$" \
    --key ./client..key \
    --cert ./client.pem.crt \
    --cacert ./ca.pem.crt \
    /some_directory/logs \
    MY_FEED \
    MY_SYSTEM \
    DEV \
    https://stroom-host/stroom/datafeed

2.5 - Simple C# Client

A simple C# client for sending data files to Stroom.

The StroomCSharpClient is a C# port of the Java client and behaves in the same way. Note that this is just an example, not a fully functional client. See StroomCSharpClient .

2.6 - Simple Java Client

A simple Java client for sending data files to Stroom.

The stroom-java-client provides an example Java client that can:

  • Read a zip, gzip or uncompressed an input file.
  • Perform a HTTP post of data with zip, gzip or uncompressed compression.
  • Pass down arguments on the command line as HTTP request arguments.
  • Supports HTTP and HTTPS with 1 or 2 way authentication.

(N.B. arguments must be in lower case).

To use the example client first compile the Java code:

javac DataFeedClient.java

Example HTTP Post:

java \
-classpath . \
DataFeedClient \
inputfile=datafeed \
url=http://<Stroom_HOST>/stroom/datafeed \
system=EXAMPLE-SYSTEM \
environment=DEV \
feed=EXAMPLE-FEED

Example HTTPS With 1 way SSL authentication:

java \
-classpath . \
-Djavax.net.ssl.trustStore=ca.jks \
-Djavax.net.ssl.trustStorePassword=capass \
DataFeedClient \
inputfile=datafeed \
url=https://<Stroom_HOST>/stroom/datafeed \
system=EXAMPLE-SYSTEM \
environment=DEV \
feed=EXAMPLE-FEED

Example HTTPS With 2 way SSL authentication:

java \
-classpath . \
-Djavax.net.ssl.trustStore=ca.jks \
-Djavax.net.ssl.trustStorePassword=capass \
-Djavax.net.ssl.keyStore=example.jks \
-Djavax.net.ssl.keyStorePassword=<PASSWORD> \
DataFeedClient \
inputfile=datafeed url=https://<Stroom_HOST>/stroom/datafeed \
system=EXAMPLE-SYSTEM \
environment=DEV \
feed=EXAMPLE-FEED

2.7 - stroom-log-sender (Docker)

A Docker image for peridoically sending log files generated by an application to Stroom.

stroom-log-sender is a small Docker image for sending data to Stroom.

This is the simplest way to get data into stroom if the data provider is itself running in docker. It can also be used for sending data to Stroom from data providers that are not running in Docker. stroom-log-sender makes use of the send_to_stroom.sh bash script that is described below. For details on how to use stroom-log-sender, see the Dockerhub link above.

2.8 - VBScript (Windows)

Using VBScript to send data to Stroom.

extract-data.vbs uses wevtutil.exe to extract Security event information from the windows event log. This script has been tested on Windows 2008.

This script is designed to run periodically (say every 10 minutes). The first time the script is run it stores the current time in UTC format in the registry. Subsequent calls then extract event information from the last run time to the new current time. The events are stored in a zip file with the period dates embedded.

The script requires a working directory used as a buffer for the zip files. This can be set at the start of the script otherwise it will default to the working directory.

The send-data.vbs script is designed to run periodically (say every 10 minutes). The script will scan for zip files and send them to Stroom.

The script details several parameters that require setting per environment. Among these are the working directory that the zip files are stored in, the feed name and the URL of Stroom.

SSL

To send data over SSL (https) you must import a client certificate in p12 format into windows. To convert a certificate (.crt) and private key (.key) into a p12 format use the following command:

openssl pkcs12 -export -in <NAME>.crt -inkey <NAME>.key -out <NAME>.p12 -name "<NAME>"

Once in p12 format use the windows certificate wizard to import the public private key.

The send-data-tree.vbs script works through a directory for different feed types.

2.9 - wget (Windows)

Using wget on Windows to send data to Stroom.

There is a version of wget for windows

  • Use --post-file argument to supply the data
  • Use --certificate and --certificate-type arguments to specify your client certificate
  • Use --header argument to inform Stroom which feed and environment your data relates to

3 - Header Arguments

The various HTTP headers that can be sent with data.

The following data must be passed in as HTTP header arguments when sending files to Stroom via HTTP POST. These arguments are case insensitive.

  • System - The name by which the system is known within the organisation, e.g. PAYROLL_SYSTEM. This could be the name of a project/service or capability.

  • Environment - A means to identify the deployed instance of a system. This may indicate the deployment status, e.g. DEV, REF, LIVE, OPS, etc., and/or the location where the instance is deployed. An environment may be a combination of these attributes separated with an underscore.

  • Feed - The name of the feed this data relates to. This is mandatory and must match a feed defined within Stroom in order for Stroom to accept the data and know what to do with it.

  • Compression - This token is optionally used when the POST payload is compressed with either gzip of zip compression. Value of ZIP and GZIP are valid. Note: The Compression token MUST not be used in conjunction with the standard HTTP header token Content-Encoding otherwise stroom will be unable to un-compress the data. Use either Compression:GZIP or Content-Encoding:gzip, not both. Using Compression is preferred.

  • EffectiveTime - This is only applicable to reference data. It is used to indicate the point in time that the reference data is applicable to, i.e. all event data that uses the reference data that is created after the effective time will use the reference data until a new reference data item arrives with a later effective time. Note: This argument must be in ISO 8601 date time format, i.e: yyyy-MM-ddTHH:mm:ss.sssZ.

Example header arguments for a feed called MY_SYSTEM-EVENTS from system MY_SYSTEM and environment OPS

System:MY_SYSTEM
Environment:OPS
Feed:MY_SYSTEM-EVENTS

The post payload must contain the events file. If the compression format is ZIP the payload must contain ZIP entries with the events files and optional context files ending in .ctx. Further details of supported payload formats can be found here.

4 - Response Codes

The HTTP response codes returned by stroom.

Stroom will return a HTTP response code to indicate success or failure. An additional response Header “Stroom-Status” will indicate a more precise error message code. A user readable message will appear in the response body.

HTTP Status Stroom-Status Message Reason
200 0 OK Post of data successful
406 100 Feed must be specified You must provide Feed as a header argument in the request
406 110 Feed is not set to receive data The feed you have provided is not setup to receive data (maybe does not exist or is set to reject)
406 200 Unknown compression Compression argument must be one of ZIP, GZIP and NONE
401 300 Client Certificate Required The feed you have provided requires a client HTTPS certificate to send data
403 310 Client Certificate not authorised The feed you have provided does not allow your client certificate to send data
500 400 Compressed stream invalid The stream of data sent does not form a valid compressed file. Maybe it terminated unexpectedly or is corrupt.
500 999 Unknown error An unknown unexpected error occurred

In the event that data is not successfully received by Stroom, i.e. the response code is not 200, the client system should buffer data and keep trying to re-send it. Data should only be removed from the client system when it has been sent successfully.

5 - Payloads

Description of the data formats for sending data into a Stroom instance.

Stroom can support multiple payload formats with different compression applied. However all data once uncompressed must be text and not binary.

Stroom can receive data in the following formats:

  • Uncompressed - Text data is sent to Stroom and no compression flag is set in the header arguments.
  • GZIP - Text data is GZIP compressed and the compression flag is set to GZIP.
  • ZIP - A text file is compressed into a ZIP archive and sent to Stroom with the compression flag set to ZIP. The ZIP file must contain one data file and an optional context file, see below.

Context Files

ZIP files sent to Stroom are expected to contain the data file and an optional context file *.ctx. If provided a context file can be used to provide reference data that is specific to the data file that has been sent. Context data is supplimentary information that is not contained within logged events, e.g. the machine name, ip address etc may be delivered in a context file if it is not written by an application in each logged event.

Character Encodings

Although Stroom only supports data in text format, text can be encoded using multiple character encodings. Supported encodings include:

  • ISO-8859-1 (understood by default)
  • Windows-1252 - ANSI (understood by default)
  • ASCII (understood by default)
  • UTF-8 (with or without BOM)
  • UTF-16LE (little endian with or without BOM)
  • UTF-16BE (big endian with or without BOM)
  • UTF-32LE (little endian with or without BOM)
  • UTF-32BE (big endian with or without BOM)

In order to tell Stroom what character encoding to use the feed that the data belongs to can be configured within the Stroom application to use a specific character encoding. Separate character encodings can be specified for logged event and context data.

6 - SSL Configuration

Configuring SSL with cURL.

This page provides a step by step guide to getting PKI authentication working correctly for Unix hosts so as to be able to sign deliveries from cURL.

First make sure you have a copy of your organisations CA certificate.

Check that the CA certificate works by running the following command:

echo "Test" | curl --cacert CA.crt --data-binary @- "https://<Stroom_HOST>/stroom/datafeed"

If the response starts with the line:

curl: (60) SSL certificate problem, verify that the CA cert is OK.

then you do not have the correct CA certificate.

If the response contains the line

HTTP Status 406 - Stroom Status 100 - Feed must be specified

then one-way SSL authentication using the CA certificate is successful.

The VBScript file to check windows certificates is check-certs.vbs (TODO link).

#Final Testing

Once one-way authentication has been tested, two-way authentication should be configured:

The server certificate and private key should be concatenated to create a PEM file:

cat hostname.cert hostname.key > hostname.pem

Finally, test for 2-way authentication:

echo "Test" | curl --cacert CA.crt --cert hostname.pem --data-binary @- "https://<Stroom_HOST>/stroom/datafeed"

If the response contains the line

HTTP Status 406 - Stroom Status 100 - Feed must be specified

then two-way SSL authentication is successful.

#Final Tidy Up

The files ca.crt and hostname.pem are the only files required for two-way authentication and should be stored permanently on the server; all other remaining files may be deleted or backed up if required.

#Certificate Expiry

PKI certificates expire after 2 years. To check the expiry date of a certificate, run the following command:

openssl x509 -in /path/to/certificate.pem -noout -enddate

This will give a response looking similar to:

notAfter=Aug 15 10:01:42 2013 GMT

7 - Java Keystores

How to create java key/trust stores for use with Java client applications.

There are many times when you may wish to create a Java keystore from certificates and keys and vice versa. This guide aims to explain how this can be done.

Import

If you need to create a Java keystore from a .crt and .key then this is how to do it.

Convert your keys to der format

openssl x509 -in <YOUR KEY>.crt -inform PEM -out <YOUR KEY>.crt.der -outform DER
openssl pkcs8 -topk8 -nocrypt -in <YOUR KEY>.key -inform PEM -out <YOUR KEY>.key.der -outform DER

ImportKey

Use the ImportKey class in the stroom-java-client library to import keys.

For example:

java ImportKey keystore=<YOUR KEY>.jks keypass=<YOUR PASSWORD> alias=<YOUR KEY> keyfile=<YOUR KEY>.key.der certfile=<YOUR KEY>.crt.der
keytool -import -noprompt -alias CA -file <CA CERT>.crt -keystore ca.jks -storepass ca

Export

ExportKey

Use the ExportKey class in the stroom-java-client library to export keys. If you would like to use curl or similar application but only have keys contained within a Java keystore then they can be exported.

For example:

java ExportKey keystore=<YOUR KEY>.jks keypass=<YOUR PASSWORD> alias=<YOUR KEY>

This will print both the key and certificate to standard out. This can then be copied into a PEM file for use with cURL or other similar application.