This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Event Feeds

1 - Writing an XSLT Translation

This HOWTO will take you through the production of an XSLT for a feed, including issues such as event filtering, common errors and testing.

Introduction

This document is intended to explain how and why to produce a translation within stroom and how the translation fits into the overall processing within stroom. It is intended for use by the developers/admins of client systems that want to send data to stroom and need to transform their events into event-logging XML format. It’s not intended as an XSLT tutorial so a basic XSLT knowledge must be assumed. The document will contain potentially useful XSLT fragments to show how certain processing activities can be carried out. As with most programming languages, there are likely to be multiple ways of producing the same end result with different degrees of complexity and efficiency. Examples here may not be the best for all situations but do reflect experience built up from many previous translation jobs.

The document should be read in conjunction with other online stroom documentation, in particular Event Processing.

Translation Overview

The translation process for raw logs is a multi-stage process defined by the processing pipeline:

Parser

The parser takes raw data and converts it into an intermediate XML document format. This is only required if source data is not already within an XML document. There are various standard parsers available (although not all may be available on a default stroom build) to cover the majority of standard source formats such as CSV, TSV, CSV with header row and XML fragments.

The language used within the parser is defined within an XML schema located at XML Schemas / data-splitter / data-splitter v3.0 within the tree browser. The data splitter schema may have been provided as part of the core schemas content pack. It is not present in a vanilla stroom. The language can be quite complex so if non-standard format logs are being parsed, it may be worth speaking to your stroom sysadmin team to at least get an initial parser configured for your data.

Stroom also has a built-in parser for JSON fragments. This can be set either by using the text.svg CombinedParser and setting the type property to JSON or preferably by just using the json.svg JSONParser .

The parser has several minor limitations. The most significant is that it’s unable to deal with records that are interleaved. This occasionally happens within multi-line syslog records where a syslog server receives the first x lines of record A followed by the first y lines of record B, then the rest of record A and finally the rest of record B (or the start of record C etc). If data is likely to arrive like this then some sort of pre-processing within the source system would be necessary to ensure that each record is a contiguous block before being forwarded to stroom.

The other main limitation of the parser is actually its flexibility. If forwarding large streams to stroom and one or more regexes within the parser have been written inefficiently or incorrectly then it’s quite possible for the parser to try to read the entire stream in one go rather than a single record or part of a record. This will slow down the overall processing and may even cause memory issues in the worst cases. This is one of the reasons why the stroom team would prefer to be involved in the production of any non-standard parsers as mentioned above.

XSLT

The actual translation takes the XML document produced by the parser and converts it to a new XML document format in what’s known as “stroom schema format”. The current latest schema is documented at XML Schemas / event-logging / event-logging v3.5.2 within the tree browser. The version is likely to change over time so you should aim to use the latest non-beta version.

Other Pipeline Elements

The pipeline functionality is flexible in that multiple XSLTs may be used in sequence to add decoration (e.g. Job Title, Grade, Employee type etc. from an HR reference database), schema validation and other business-related tasks. However, this is outside the scope of this document and pipelines should not be altered unless agreed with the stroom sysadmins. As an example, we’ve seen instances of people removing schema validation tasks from a pipeline so that processing appears to occur without error. In practice, this just breaks things further down the processing chain.

Translation Basics

Assuming you have a simple pipeline containing a working parser and an empty XSLT, the output of the parser will look something like this:

<?xml version="1.1" encoding="UTF-8"?>
<records
    xmlns="records:2"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="records:2 file://records-v2.0.xsd"
    version="2.0">
  <record>
    <data value="2022-04-06 15:45:38.737" />
    <data value="fakeuser2" />
    <data value="192.168.0.1" />
    <data value="1000011" />
    <data value="200" />
    <data value="Query success" />
    <data value="1" />
  </record>
</records>

The data nodes within the record node will differ as it’s possible to have nested data nodes as well as named data nodes, but for a non-JSON and non-XML fragment source data format, the top-level structure will be similar.

The XSLT needed to recognise and start processing the above example data needs to do several things. The following initial XSLT provides the minimum required function:

<?xml version="1.1" encoding="UTF-8" ?>
<xsl:stylesheet
    xpath-default-namespace="records:2" 
    xmlns="event-logging:3" 
    xmlns:stroom="stroom"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    version="2.0">

  <xsl:template match="records">
    <Events
        xsi:schemaLocation="event-logging:3 file://event-logging-v3.5.2.xsd" Version="3.5.2">
      <xsl:apply-templates />
    </Events>
  </xsl:template>

  <xsl:template match="record">
    <Event>
      ...
    </Event>
  </xsl:template>
</xsl:stylesheet>

The following lists the necessary functions of the XSLT, along with the line numbers where they’re implemented in the above example:

  • Match the source namespace - line 3;
  • Specify the output namespace - lines 4, 12;
  • Specify the namespace for any functions - lines 5-8;
  • Match the top-level records node - line 10;
  • Provide any output in stroom schema format - lines 11, 14, 18-20;
  • Individually match subsequent record nodes - line 17.

This XSLT will generate the following output data:

<?xml version="1.1" encoding="UTF-8"?>
<Events
    xmlns="event-logging:3"
    xmlns:stroom="stroom"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="event-logging:3 file://event-logging-v3.5.2.xsd"
    Version="3.5.2">
  <Event>
    ...
  </Event>
  ...
<Events>

It’s normally best to get this part of the XSLT correctly stepping before getting any further into the code.

Similarly for JSON fragments, the output of the parser will look like:

<?xml version="1.1" encoding="UTF-8"?>
<map xmlns="http://www.w3.org/2013/XSL/json">
  <map>
    <string key="version">0</string>
    <string key="id">2801bbff-fafa-4427-32b5-d38068d3de73</string>
    <string key="detail-type">xyz_event</string>
    <string key="source">my.host.here</string>
    <string key="account">223592823261</string>
    <string key="time">2022-02-15T11:01:36Z</string>
    <array key="resources" />
    <map key="detail">
      <number key="id">1644922894335</number>
      <string key="userId">testuser</string>
    </map>
  </map>
</map>

The following initial XSLT will carry out the same tasks as before:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet
    xpath-default-namespace="http://www.w3.org/2013/XSL/json"
    xmlns="event-logging:3"
    xmlns:stroom="stroom"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    version="2.0">

  <xsl:template match="/map">
    <Events
        xsi:schemaLocation="event-logging:3 file://event-logging-v3.5.2.xsd" Version="3.5.2">
      <xsl:apply-templates />
    </Events>
  </xsl:template>

  <xsl:template match="/map/map">
    <Event>
      ...
    </Event>
  </xsl:template>
</xsl:stylesheet>

The necessary functions of the XSLT, along with the line numbers where they’re implemented in the above example as before:

  • Match the source namespace - line 3;
  • Specify the output namespace - lines 4, 12;
  • Specify the namespace for any functions - lines 5-8;
  • Match the top-level /map node - line 10;
  • Provide any output in stroom schema format - lines 11, 14, 18-20;
  • Individually match subsequent /map/map nodes - line 17.

This XSLT will generate the following output data which is identical to the previous output:

<?xml version="1.1" encoding="UTF-8"?>
<Events
    xmlns="event-logging:3"
    xmlns:stroom="stroom"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="event-logging:3 file://event-logging-v3.5.2.xsd"
    Version="3.5.2">
  <Event>
    ...
  </Event>
  ...
<Events>

Once the initial XSLT is correct, it’s a fairly simple matter to populate the correct nodes using standard XSLT functions and a knowledge of XPaths.

Extending the Translation to Populate Specific Nodes

The above examples of <xsl:apply-templates match="..."/> for an Event all point to a specific path within the XML document - often at /records/record/ or at /map/map/. XPath references to nodes further down inside the record should normally be made relative to this node.

Depending on the output format from the parser, there are two ways of referencing a field to populate an output node.

If the intermedia XML is of the following format:

<record>
  <data value="2022-04-06 15:45:38.737" />
  <data value="fakeuser2" />
  <data value="192.168.0.1" />
  ...
</record>

Then the developer needs to understand which field contains what data and then to reference based upon the index, e.g:

<IPAddress>
  <xsl:value-of select="data[3]/@value"/>
</IPAddress>

However, if the intermediate XML is of this format:

<record>
  <data name="time" value="2022-04-06 15:45:38.737" />
  <data name="user" value="fakeuser2" />
  <data name="ip" value="192.168.0.1" />
  ...
</record>

Then, although the first method is still acceptable, it’s easier and safer to reference by @name:

<IPAddress>
  <xsl:value-of select="data[@name='ip']/@value"/>
</IPAddress>

This second method also has the advantage that if the field positions differ for different event types, the names will hopefully stay the same, saving the need to add if TypeA then do X, if TypeB then do Y, ... code into the XSLT.

More complex field references are likely to be required at times, particularly for data that’s been converted using the internal JSON parser. Assuming source data of:

<map>
  <string key="version">0</string>
  ...
  <array key="resources" />
  <map key="detail">
    <number key="id">1644922894335</number>
    <string key="userId">testuser</string>
  </map>
</map>

Then selecting the id field requires something like:

<xsl:value-of select="map[@key='detail']/number[@key='id']"/>

It’s important at this stage to have a reasonable understanding of which fields in the source data provide what detail in terms of stroom schema values, which fields can be ignored and which can be used but modified to control the flow of the translation. For example - there may be an IP address within the log, but is it of the device itself or of the client? It’s normally best to start with several examples of each event type requiring translation to ensure that fields are translated correctly.

Structuring the XSLT

There are many different ways of structuring the overall XSLT and it’s ultimately for the developer to decide the best way based upon the requirements of their own data. However, the following points should be noted:

  • When working on e.g a CreateDocument event, it’s far easier to edit a 10-line template named CreateDocument than lines 841-850 of a template named MainTemplate. Therefore, keep each template relatively small and helpfully named.
  • Both the logic and XPaths required for EventTime and EventSource are normally common to all or most events for a given log. Therefore, it usually makes sense to have a common EventTime and EventSource template for all event types rather than a duplicate of this code for each event type.
  • If code needs to be repeated in multiple templates, then it’s often simpler to move that code into a separate template and call it from multiple places. This is often used for e.g. adding an Outcome node for multiple failed event types.
  • Use comments within the XSLT even when the code appears obvious. If nothing else, a comment field will ensure a newline prior to the comment once auto-formatted. This allows the end of one template and the start of the next template to be differentiated more easily if each template is prefixed by something like <!-- Template for EventDetail -->. Comments are also useful for anybody who needs to fix your code several years later when you’ve moved on to far more interesting work.
  • For most feeds, the main development work is within the EventDetail node. This will normally contain a lot of code effectively doing if CreateDocument do X; if DeleteFile do Y; if SuccessfulLogin do Z; .... From experience, the following type of XSLT is normally the easiest to write and to follow:
  <!-- Event Detail template -->
  <xsl:template name="EventDetail">
    <xsl:variable name="typeId" select="..."/>
      <EventDetail>
        <xsl:choose>
          <xsl:when test="$typeId='A'">
            <xsl:call-template name="Logon"/>
          </xsl:when>
          <xsl:when test="$typeId='B'">
            <xsl:call-template name="CreateDoc"/>
          </xsl:when>
          ...
        </xsl:choose>
      </EventDetail>
    </xsl:template>
  • If in the above example, the various values of $typeId are sufficiently descriptive to use as text values then the TypeId node can be implemented prior to the <xsl:choose> to avoid specifying it once in each child template.
  • It’s common for systems to generate Create/Delete/View/Modify/... events against a range of different Document/File/Email/Object/... types. Rather than looking at events such as CreateDocument/DeleteFile/... and creating a template for each, it’s often simpler to work in two stages. Firstly create templates for the Create/Delete/... types within EventDetail and then from each of these templates, call another template which then checks and calls the relevant object template.
  • It’s also sometimes possible to take the above multi-step process further and use a common template for Create/Delete/View. The following code assumes that the variable ${evttype} is a valid schema action such as Create/Delete/View. Whilst it can be used to produce more compact XSLT code, it tends to lose readability and makes extending the code for additional types more difficult. The inner <xsl:choose> can even be simplified again by populating an <xsl:element> with {objType} to make the code even more compact and more difficult to follow. There may occasionally be times when this sort of thing is useful but care should be taken to use it sparingly and provide plenty of comments.
  <xsl:variable name="evttype" select="..."/>
  <xsl:element name="${evttype}">
    <xsl:choose>
      <xsl:when test="objType='Document'">
        <xsl:call-template name="Document"/>
      </xsl:when>
      <xsl:when test="objType='File'">
        <xsl:call-template name="File"/>
      </xsl:when>
      ...
    </xsl:choose>
  </xsl:element>

There are always exceptions to the above advice. If a feed will only ever contain e.g. successful logins then it may be easier to create the entire event within a single template, for example. But if there’s ever a possibility of e.g. logon failures, logoffs or anything else in the future then it’s safer to structure the XSLT into separate templates.

Filtering Wanted/Unwanted Event Types

It’s common that not all received events are required to be translated. Depending upon the data being received and the auditing requirements that have been set against the source system, there are several ways to filter the events.

Remove Unwanted Events

The first method is best to use when the majority of event types are to be translated and only a few types, such as debug messages are to be dropped. Consider the code fragment from earlier:

<xsl:template match="record">
  <Event>
    ...
  </Event>
</xsl:template>

This will create an Event node for every source record. However, if we replace this with something like:

<xsl:template match="record[data[@name='logLevel' and @value='DEBUG']]"/>

<xsl:template match="record[data[@name='msgType'
                                 and (@value='drop1' or @value='drop2')
                                ]]"/>

<xsl:template match="record">
  <Event>
    ...
  </Event>
</xsl:template>

This will filter out all DEBUG messages and messages where the msgType is either “drop1" or “drop2". All other messages will result in an Event being generated.

This method is often not suited to systems where the full set of message types isn’t known prior to translation development, such as for closed source software where the full set of possible messages isn’t already known. If an unexpected message type appears in the logs then it’s likely that the translation won’t know how to deal with it and may either make incorrect assumptions about it or fail to produce a schema-compliant output.

Translate Wanted Events

This is the opposite of the previous method and the XSLT just ignores anything that it’s not expecting. This method is best used where only a few event types are of interest such as the scenario of translation logons/logoffs from a vast range of possible types.

For this, we’d use something like:

<xsl:template match="record[data[@name='msgType'
                                   and (@value='logon' or @value='logoff')
                                  ]]">
  <Event>
    ...
  </Event>
</xsl:template>

<xsl:template match="text()"/>

The final line stops the XSLT outputting a sequence of unformatted text nodes for any unmatched event types when an <xsl:apply-templates/> is used elsewhere within the XSLT. It isn’t always needed but does no harm if present.

This method starts to become messy and difficult to understand if a large number of wanted types are to be matched.

Advanced Removal Method (With Optional Warnings)

Where the full list of event types isn’t known or may expand over time, the best method may be to filter out the definite unwanted events and handle anything unexpected as well as the known and wanted events. This would use code similar to before to drop the specific unwanted types but handle everything else including unknown types:

<xsl:template match="record[data[@name='logLevel' and @value='DEBUG']]"/>
...
<xsl:template match="record[data[@name='msgType'
                                   and (@value='drop1' or @value='drop2')
                                  ]]"/>

<xsl:template match="record">
  <Event>
    ...
  </Event>
</xsl:template>

However, the XSLT must then be able to handle unknown arbitrary event types. In practice, most systems provide a consistent format for logging the “who/where/when" and it’s only the “what" that differs between event types. Therefore, it’s usually possible to add something like this into the XSLT:

<EventDetail>
  <xsl:choose>
    <xsl:when test="$evtType='1'">
      ...
    </xsl:when>
    ...
    <xsl:when test="$evtType='n'">
      ...
    </xsl:when>
    <!-- Unknown event type -->
    <xsl:otherwise>
      <Unknown>
        <xsl:value-of select="stroom:log(‘WARN',concat('Unexpected Event Type - ', $evtType))"/>
        ...
      </Unknown>
    </xsl:otherwise>
</EventDetail>

This will create an Event of type Unknown. The Unknown node is only able to contain data name/value pairs and it should be simple to extract these directly from the intermediate XML using an <xsl:for-each>. This will allow the attributes from the source event to populate the output event for later analysis but will also generate an error stream of level WARN which will record the event type. Looking through these error streams will allow the developer to see which unexpected events have appeared then either filter them out within a top-level <xsl:template match="record[data[@name='...' and @value='...']]"/> statement or to produce an additional <xsl:when> within the EventDetail node to translate the type correctly.

Common Mistakes

Performance Issues

The way that the code is written can affect its overall performance. This may not matter for low-volume logs but can greatly affect processing time for higher volumes. Consider the following example:

<!-- Event Detail template -->
<xsl:template name="EventDetail">
  <xsl:variable name="X" select="..."/>
  <xsl:variable name="Y" select="..."/>
  <xsl:variable name="Z" select="..."/>

  <EventDetail>
    <xsl:choose>
      <xsl:when test="$X='A' and $Y='foo' and matches($Z,'blah.*blah')">
        <xsl:call-template name="AAA"/>
      </xsl:when>
      <xsl:when test="$X='B' or $Z='ABC'">
        <xsl:call-template name="BBB"/>
      </xsl:when>
      ...
      <xsl:otherwise>
        <xsl:call-template name="ZZZ"/>
      </xsl:otherwise>
    </xsl:choose>
  </EventDetail>
</xsl:template>

If none of the <xsl:when> choices match, particularly if there are many of them or their logic is complex then it’ll take a significant time to reach the <xsl:otherwise> element. If this is by far the most common type of source data (i.e. none of the specific <xsl:when> elements is expected to match very often) then the XSLT will be slow and inefficient. It’s therefore better to list the most common examples first, if known.

It’s also usually better to have a hierarchy of smaller numbers of options within an <xsl:choose>. So rather than the above code, the following is likely to be more efficient:

<xsl:choose>
  <xsl:when test="$X='A'">
    <xsl:choose>
      <xsl:when test="$Y='foo'">
        <xsl:choose>
          <xsl:when test="matches($Z,'blah.*blah')">
            <xsl:call-template name="AAA"/>
          </xsl:when>
          <xsl:otherwise>
            ...
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>
      ...
    </xsl:choose>
    ...
  </xsl:when>
  ...
</xsl:choose>

Whilst this code looks more complex, it’s far more efficient to carry out a shorter sequence of checks, each based upon the result of the previous check, rather than a single consecutive list of checks where the data may only match the final check.

Where possible, the most commonly appearing choices in the source data should be dealt with first to avoid running through multiple <xsl:when> statements.

Stepping Works Fine But Errors Whilst Processing

When data is being stepped, it’s only ever fed to the XSLT as a single event, whilst a pipeline is able to process multiple events within a single input stream. This apparently minor difference sometimes results in obscure errors if the translation has incorrect XPaths specified. Taking the following input data example:

<TopLevelNode>
  <EventNode>
    <Field1>1</Field1>
    ...
  </EventNode>
  <EventNode>
    <Field1>2</Field1>
    ...
  </EventNode>
  ...
  <EventNode>
    <Field1>n</Field1>
    ...
  </EventNode>
</TopLevelNode>

If an XSLT is stepped, all XPaths will be relative to <EventNode>. To extract the value of Field1, you’d use something similar to <xsl:value-of select="Field1"/>. The following examples would also work in stepping mode or when there was only ever one Event per input stream:

<xsl:value-of select="//Field1"/>
<xsl:value-of select="../EventNode/Field1"/>
<xsl:value-of select="../*/Field1"/>
<xsl:value-of select="/TopLevelNode/EventNode/Field1"/>

However, if there’s ever a stream with multiple event nodes, the output from pipeline processing would be a sequence of the Field1 node values i.e. 12...n for each event. Whilst it’s easy to spot the issues in these basic examples, it’s harder to see in more complex structures. It’s also worth mentioning that just because your test data only ever has a single event per stream, there’s nothing to say it’ll stay this way when operational or when the next version of the software is installed on the source system, so you should always guard against using XPaths that go to the root of the tree.

Unexpected Data Values Causing Schema Validation Errors

A source system may provide a log containing an IP address. All works fine for a while with the following code fragment:

<Client>
  <IPAddress>
    <xsl:value-of select="$ipAddress"/>
  </IPAddress>
</Client>

However, let’s assume that in certain circumstances (e.g. when accessed locally rather than over a network) the system provides a value of localhost or something else that’s not an IP address. Whilst the majority of schema values are of type string, there are still many that are limited in character set in some way. The most common is probably IPAddress and it must match a fairly complex regex to be valid. In this instance, the translation will still succeed but any schema validation elements within the pipeline will throw an error and stop the invalid event (not just the invalid element) from being output within the Events stream. Without the event in the stream, it’s not indexable or searchable so is effectively dropped by the system.

To resolve this issue, the XSLT should be aware of the possibility of invalid input using something like the following:

<Client>
  <xsl:choose>
    <xsl:when test="matches($ipAddress,'^[.0-9]+$')">
      <IPAddress>
        <xsl:value-of select="$ipAddress"/>
      </IPAddress>
    </xsl:when>
    <xsl:otherwise>
      <HostName>
        <xsl:value-of select="$ipAddress"/>
      </HostName>
    </xsl:otherwise>
  </xsl:choose>
</Client>

This would need to be modified slightly for IPv6 and also wouldn’t catch obvious errors such as 999.1..8888 but if we can assume that the source will generate either a valid IP address or a valid hostname then the events will at least be available within the output stream.

Testing the Translation

When stepping a stream with more than a few events in it, it’s possible to filter the stepping rather than just moving to first/previous/next/last. In the bottom right hand corner of the bottom right hand pane within the XSLT tab, there’s a small filter icon filter.svg that’s often not spotted. The icon will be grey if no filter is set or green if set. Opening this filter gives choices such as:

  • Jump to error
  • Jump to empty/non-empty output
  • Jump to specific XPath exists/contains/equals/unique

Each of these options can be used to move directly to the next/previous event that matches one of these attributes.

A filter on e.g. the xslt.svg XSLTFilter will still be active even if viewing the text.svg DSParser or any other pipeline entry, although the filter that’s present in the parser step will not show any values. This may cause confusion if you lose track of which filters have been set on which steps.

Filters can be entered for multiple pipeline elements, e.g. Empty output in translationFilter and Error in schemaFilter. In this example, all empty outputs AND schema errors will be seen, effectively providing an OR of the filters.

The XPath syntax is fairly flexible. If looking for specific TypeId values, the shortcut of //TypeId will work just as well as /Events/Event/EventDetail/TypeId, for example.

Using filters will allow a developer to find a wide range of types of records far quicker than stepping through a large file of events.

2 - Apache HTTPD Event Feed

The following will take you through the process of creating an Event Feed in Stroom.

Introduction

The following will take you through the process of creating an Event Feed in Stroom.

In this example, the logs are in a well-defined, line based, text format so we will use a Data Splitter parser to transform the logs into simple record-based XML and then a XSLT translation to normalise them into the Event schema.

A separate document will describe the method of automating the storage of normalised events for this feed. Further, we will not Decorate these events. Again, Event Decoration is described in another document.

Event Log Source

For this example, we will use logs from an Apache HTTPD Web server. In fact, the web server in front of Stroom v5 and earlier.

To get the optimal information from the Apache HTTPD access logs, we define our log format based on an extension of the BlackBox format. The format is described and defined below. This is an extract from a httpd configuration file (/etc/httpd/conf/httpd.conf)


# Stroom - Black Box Auditing configuration
#
# %a - Client IP address (not hostname (%h) to ensure ip address only)
# When logging the remote host, it is important to log the client IP address, not the
# hostname. We do this with the '%a' directive. Even if HostnameLookups are turned on,
# using '%a' will only record the IP address. For the purposes of BlackBox formats,
# reversed DNS should not be trusted

# %{REMOTE_PORT}e - Client source port
# Logging the client source TCP port can provide some useful network data and can help
# one associate a single client with multiple requests.
# If two clients from the same IP address make simultaneous connections, the 'common log'
# file format cannot distinguish between those clients. Otherwise, if the client uses
# keep-alives, then every hit made from a single TCP session will be associated by the same
# client port number.
# The port information can indicate how many connections our server is handling at once,
# which may help in tuning server TCP/OP settings. It will also identify which client ports
# are legitimate requests if the administrator is examining a possible SYN-attack against a
# server.
# Note we are using the REMOTE_PORT environment variable. Environment variables only come
# into play when mod_cgi or mod_cgid is handling the request.

# %X - Connection status (use %c for Apache 1.3)
# The connection status directive tells us detailed information about the client connection.
# It returns one of three flags:
# x if the client aborted the connection before completion,
# + if the client has indicated that it will use keep-alives (and request additional URLS),
# - if the connection will be closed after the event
# Keep-Alive is a HTTP 1.1. directive that informs a web server that a client can request multiple
# files during the same connection. This way a client doesn't need to go through the overhead
# of re-establishing a TCP connection to retrieve a new file.

# %t - time - or [%{%d/%b/%Y:%T}t.%{msec_frac}t %{%z}t] for Apache 2.4
# The %t directive records the time that the request started.
# NOTE: When deployed on an Apache 2.4, or better, environment, you should use
# strftime format in order to get microsecond resolution.

# %l - remote logname

# %u - username [in quotes]
# The remote user (from auth; This may be bogus if the return status (%s) is 401
# for non-ssl services)
# For SSL services, user names need to be delivered as DNs to deliver PKI user details
# in full. To pass through PKI certificate properties in the correct form you need to
# add the following directives to your Apache configuration:
#   SSLUserName SSL_CLIENT_S_DN
#   SSLOptions +StdEnvVars
# If you cannot, then use %{SSL_CLIENT_S_DN}x in place of %u and use blackboxSSLUser
# LogFormat nickname

# %r - first line of text sent by web client [in quotes]
# This is the first line of text send by the web client, which includes the request
# method, the full URL, and the HTTP protocol.

# %s - status code before any redirection
# This is the status code of the original request.

# %>s - status code after any redirection has taken place
# This is the final status code of the request, after any internal redirections may
# have taken place.

# %D - time in microseconds to handle the request
# This is the number of microseconds the server took to handle the request in microseconds

# %I - incoming bytes
# This is the bytes received, include request and headers. It cannot, by definition be zero.

# %O - outgoing bytes
# This is the size in bytes of the outgoing data, including HTTP headers. It cannot, by
# definition be zero.

# %B - outgoing content bytes
# This is the size in bytes of the outgoing data, EXCLUDING HTTP headers. Unlike %b, which
# records '-' for zero bytes transferred, %B will record '0'.

# %{Referer}i - Referrer HTTP Request Header [in quotes]
# This is typically the URL of the page that made the request. If linked from
# e-mail or direct entry this value will be empty. Note, this can be spoofed
# or turned off

# %{User-Agent}i - User agent HTTP Request Header [in quotes]
# This is the identifying information the client (browser) reports about itself.
# It can be spoofed or turned off

# %V - the server name according to the UseCannonicalName setting
# This identifies the virtual host in a multi host webservice

# %p - the canonical port of the server servicing the request

# Define a variation of the Black Box logs
#
# Note, you only need to use the 'blackboxSSLUser' nickname if you cannot set the
# following directives for any SSL configurations
# SSLUserName SSL_CLIENT_S_DN
# SSLOptions +StdEnvVars
# You will also note the variation for no logio module. The logio module supports
# the %I and %O formatting directive
#
<IfModule mod_logio.c>
   LogFormat "%a/%{REMOTE_PORT}e %X %t %l \"../../"%r\" %s/%>s %D %I/%O/%B \"%{Referer}i\" \"%{User-Agent}i\" %V/%p" blackboxUser
   LogFormat "%a/%{REMOTE_PORT}e %X %t %l \"%{SSL_CLIENT_S_DN../../"%r\" %s/%>s %D %I/%O/%B \"%{Referer}i\" \"%{User-Agent}i\" %V/%p" blackboxSSLUser
</IfModule>
<IfModule !mod_logio.c>
   LogFormat "%a/%{REMOTE_PORT}e %X %t %l \"../../"%r\" %s/%>s %D 0/0/%B \"%{Referer}i\" \"%{User-Agent}i\" %V/$p" blackboxUser
   LogFormat "%a/%{REMOTE_PORT}e %X %t %l \"%{SSL_CLIENT_S_DN../../"%r\" %s/%>s %D 0/0/%B \"%{Referer}i\" \"%{User-Agent}i\" %V/$p" blackboxSSLUser
</IfModule>



Apache BlackBox Auditing Configuration ( Download ApacheHTTPDAuditConfig.txt )

As Stroom can use PKI for login, you can configure Stroom’s Apache to make use of the blackboxSSLUser log format. A sample set of logs in this format appear below.

192.168.4.220/61801 - [18/Jan/2020:12:39:04 -0800] - "/C=USA/ST=CA/L=Los Angeles/O=Default Company Ltd/CN=Burn Frank (burn)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 21221 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.4.220/61854 - [18/Jan/2020:12:40:04 -0800] - "/C=USA/ST=CA/L=Los Angeles/O=Default Company Ltd/CN=Burn Frank (burn)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 7889 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.4.220/61909 - [18/Jan/2020:12:41:04 -0800] - "/C=USA/ST=CA/L=Los Angeles/O=Default Company Ltd/CN=Burn Frank (burn)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 6901 2389/3796/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.4.220/61962 - [18/Jan/2020:12:42:04 -0800] - "/C=USA/ST=CA/L=Los Angeles/O=Default Company Ltd/CN=Burn Frank (burn)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 11219 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.8.151/62015 - [18/Jan/2020:12:43:04 +1100] - "/C=AUS/ST=NSW/L=Sydney/O=Default Company Ltd/CN=Max Bergman (maxb)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 4265 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.8.151/62092 - [18/Jan/2020:12:44:04 +1100] - "/C=AUS/ST=NSW/L=Sydney/O=Default Company Ltd/CN=Max Bergman (maxb)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 9791 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.8.151/62147 - [18/Jan/2020:12:44:10 +1100] - "/C=AUS/ST=NSW/L=Sydney/O=Default Company Ltd/CN=Max Bergman (maxb)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 9791 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.8.151/62147 - [18/Jan/2020:12:44:20 +1100] - "/C=AUS/ST=NSW/L=Sydney/O=Default Company Ltd/CN=Max Bergman (maxb)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 11509 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.8.151/62202 - [18/Jan/2020:12:44:21 +1100] - "/C=AUS/ST=NSW/L=Sydney/O=Default Company Ltd/CN=Max Bergman (maxb)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 4627 2389/3796/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.8.151/62294 - [18/Jan/2020:12:44:21 +1100] - "/C=AUS/ST=NSW/L=Sydney/O=Default Company Ltd/CN=Max Bergman (maxb)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 12367 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.8.151/62349 - [18/Jan/2020:12:44:25 +1100] - "/C=AUS/ST=NSW/L=Sydney/O=Default Company Ltd/CN=Max Bergman (maxb)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 12765 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.234.9/62429 - [18/Jan/2020:12:50:06 +0000] - "/C=GBR/ST=GLOUCESTERSHIRE/L=Bristol/O=Default Company Ltd/CN=Kostas Kosta (kk)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 12245 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.234.9/62429 - [18/Jan/2020:12:50:04 +0000] - "/C=GBR/ST=GLOUCESTERSHIRE/L=Bristol/O=Default Company Ltd/CN=Kostas Kosta (kk)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 12245 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.234.9/62495 - [18/Jan/2020:12:51:04 +0000] - "/C=GBR/ST=GLOUCESTERSHIRE/L=Bristol/O=Default Company Ltd/CN=Kostas Kosta (kk)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 4327 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.234.9/62549 - [18/Jan/2020:12:52:04 +0000] - "/C=GBR/ST=GLOUCESTERSHIRE/L=Bristol/O=Default Company Ltd/CN=Kostas Kosta (kk)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 7148 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443
192.168.234.9/62626 - [18/Jan/2020:12:52:06 +0000] - "/C=GBR/ST=GLOUCESTERSHIRE/L=Bristol/O=Default Company Ltd/CN=Kostas Kosta (kk)" "POST /stroom/stroom/dispatch.rpc HTTP/1.1" 200/200 11386 2289/415/14 "https://stroomnode00.strmdev00.org/stroom/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36" stroomnode00.strmdev00.org/443


Apache BlackBox sample log ( Download sampleApacheBlackBox.log )

Save a copy of this data to your local environment for use later in this HOWTO. Save this file as a text document with ANSI encoding.

Create the Feed and its Pipeline

To reflect the source of these Accounting Logs, we will name our feed and its pipeline Apache-SSLBlackBox-V2.0-EVENTS and it will be stored in the system group Apache HTTPD under the main system group - Event Sources.

Create System Group

To create the system group Apache HTTPD, navigate to the Event Sources/Infrastructure/WebServer system group within the Explorer pane (if this system group structure does not already exist in your Stroom instance then refer to the HOWTO Stroom Explorer Management for guidance). Left click to highlight the WebServer system group then right click to bring up the object context menu. Navigate to the New icon, then the Folder icon to reveal the New Folder selection window.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-00.png

Navigate Explorer

In the New Folder window enter Apache HTTPD into the Name: text entry box.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-01.png

Create System Group

The click on Ok at which point you will be presented with the Apache HTTPD system group configuration tab. Also note, the WebServer system group within the Explorer pane has automatically expanded to display the Apache HTTPD system group.

Close the Apache HTTPD system group configuration tab by clicking on the close item icon on the right-hand side of the tab Folder.svg Apache HTTPD × .

We now need to create, in order

  • the Feed,
  • the Text Parser,
  • the Translation and finally,
  • the Pipeline.

Create Feed

Within the Explorer pane, and having selected the Apache HTTPD group, right click to bring up object context menu. Navigate to New, Feed

images/HOWTOs/v6/UI-ApacheHttpEventFeed-03.png

Apache Create Feed

Select the Feed icon document/Feed.svg , when the New Feed selection window comes up, ensure the Apache HTTPD system group is selected or navigate to it. Then enter the name of the feed, Apache-SSLBlackBox-V2.0-EVENTS, into the Name: text entry box the press Ok .

It should be noted that the default Stroom FeedName pattern will not accept this name. One needs to modify the stroom.feedNamePattern stroom property to change the default pattern to ^[a-zA-Z0-9_-\.]{3,}$. See the HOWTO on System Properties document to see how to make this change.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-04.png

New Feed dialog

At this point you will be presented with the new feed’s configuration tab and the feed’s Explorer object will automatically appear in the Explorer pane within the Apache HTTPD system group.

Select the Settings tab on the feed’s configuration tab. Enter an appropriate description into the Description: text entry box, for instance:

“Apache HTTPD events for BlackBox Version 2.0. These events are from a Secure service (https).”

In the Classification: text entry box, enter a Classification of the data that the event feed will contain - that is the classification or sensitivity of the accounting log’s content itself.

As this is not a Reference Feed, leave the Reference Feed: check box unchecked.

We leave the Feed Status: at Receive.

We leave the Stream Type: as Raw Events as this we will be sending batches (streams) of raw event logs.

We leave the Data Encoding: as UTF-8 as the raw logs are in this form.

We leave the Context Encoding: as UTF-8 as there no context events for this feed.

We leave the Retention Period: at Forever as we do not want to delete the raw logs.

This results in

images/HOWTOs/v6/UI-ApacheHttpEventFeed-05.png

New Feed tab

Save the feed by clicking on the save icon save.svg .

Create Text Converter

Within the Explorer pane, and having selected the Apache HTTPD system group, right click to bring up object context menu, then select:

add.svg
New
document/TextConverter.svg
Text Converter

When the New Text Converter

images/HOWTOs/v6/UI-ApacheHttpEventFeed-07.png

New Text Converter

selection window comes up enter the name of the feed, Apache-SSLBlackBox-V2.0-EVENTS, into the Name: text entry box then press Ok . At this point you will be presented with the new text converter’s configuration tab.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-08.png

Text Converter configuration tab

Enter an appropriate description into the Description: text entry box, for instance

“Apache HTTPD events for BlackBox Version 2.0 - text converter. See Conversion for complete documentation.”

Set the Converter Type: to be Data Splitter from drop down menu.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-09.png

Text Converter configuration settings

Save the text converter by clicking on the save icon save.svg .

Create XSLT Translation

Within the Explorer pane, and having selected the Apache HTTPD system group, right click to bring up object context menu, then select:

add.svg
New
document/XSLT.svg
XSL Translation

When the New XSLT selection window comes up,

images/HOWTOs/v6/UI-ApacheHttpEventFeed-11.png

New XSLT

enter the name of the feed, Apache-SSLBlackBox-V2.0-EVENTS, into the Name: text entry box then press Ok . At this point you will be presented with the new XSLT’s configuration tab.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-12.png

New XSLT tab

Enter an appropriate description into the Description: text entry box, for instance

“Apache HTTPD events for BlackBox Version 2.0 - translation. See Translation for complete documentation.”

images/HOWTOs/v6/UI-ApacheHttpEventFeed-13.png

New XSLT settings

Save the XSLT by clicking on the save save.svg icon.

Create Pipeline

In the process of creating this pipeline we have assumed that the Template Pipeline content pack has been loaded, so that we can Inherit a pipeline structure from this content pack and configure it to support this specific feed.

Within the Explorer pane, and having selected the Apache HTTPD system group, right click to bring up object context menu, then select:

add.svg
New
document/Pipeline.svg
Pipeline

When the New Pipeline selection window comes up, navigate to, then select the Apache HTTPD system group and then enter the name of the pipeline, Apache-SSLBlackBox-V2.0-EVENTS into the Name: text entry box then press Ok . At this you will be presented with the new pipeline’s configuration tab

images/HOWTOs/v6/UI-ApacheHttpEventFeed-15.png

New Pipeline tab

As usual, enter an appropriate Description:

“Apache HTTPD events for BlackBox Version 2.0 - pipeline. This pipeline uses the standard event pipeline to store the events in the Event Store.”

images/HOWTOs/v6/UI-ApacheHttpEventFeed-16.png

New Pipeline settings

Save the pipeline by clicking on the save icon save.svg .

We now need to select the structure this pipeline will use. We need to move from the Settings sub-item on the pipeline configuration tab to the Structure sub-item. This is done by clicking on the Structure link, at which we see

images/HOWTOs/v6/UI-ApacheHttpEventFeed-17.png

New Pipeline Structure

Next we will choose an Event Data pipeline. This is done by inheriting it from a defined set of Template Pipelines. To do this, click on the menu selection icon to the right of the Inherit From: text display box.

When the Choose item

images/HOWTOs/v6/UI-ApacheHttpEventFeed-18.png

New Pipeline inherited from

selection window appears, select from the Template Pipelines system group. In this instance, as our input data is text, we select (left click) the Document/Pipeline.svg Event Data (Text) pipeline

images/HOWTOs/v6/UI-ApacheHttpEventFeed-19.png

New Pipeline inherited selection

then press Ok . At this we see the inherited pipeline structure of

images/HOWTOs/v6/UI-ApacheHttpEventFeed-20.png

New Pipeline inherited structure

For the purpose of this HOWTO, we are only interested in two of the eleven (11) elements in this pipeline

  • the Text Converter labelled dsParser
  • the XSLT Translation labelled translationFilter

We now need to associate our Text Converter and Translation with the pipeline so that we can pass raw events (logs) through our pipeline in order to save them in the Event Store.

To associate the Text Converter, select the Text Converter icon, to display.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-21.png

New Pipeline associate textconverter

Now identify to the Property pane (the middle pane of the pipeline configuration tab), then and double click on the textConverter Property Name to display the Edit Property selection window that allows you to edit the given property

images/HOWTOs/v6/UI-ApacheHttpEventFeed-22.png

New Pipeline textconverter association

We leave the Property Source: as Inherit but we need to change the Property Value: from None to be our newly created Apache-SSLBlackBox-V2.0-EVENTS Text Converter.

To do this, position the cursor over the menu selection icon popup.png to the right of the Value: text display box and click to select. Navigate to the Apache HTTPD system group then select the Apache-SSLBlackBox-V2.0-EVENTS text Converter

images/HOWTOs/v6/UI-ApacheHttpEventFeed-23.png

New Pipeline textconverter association

then press Ok . At this we will see the Property Value set

images/HOWTOs/v6/UI-ApacheHttpEventFeed-24.png

New Pipeline textconverter association

Again press Ok to finish editing this property and we see that the textConverter Property has been set to Apache-SSLBlackBox-V2.0-EVENTS

images/HOWTOs/v6/UI-ApacheHttpEventFeed-25.png

New Pipeline textconverter association

We perform the same actions to associate the translation.

First, we select the translation Filter’s xslt.svg translationFilter element and then within translation Filter’s Property pane we double click on the xslt Property Name to bring up the Property Editor. As before, bring up the Choose item selection window, navigate to the Apache HTTPD system group and select the Apache-SSLBlackBox-V2.0-EVENTS xslt Translation.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-26.png

New Pipeline Translation association

We leave the remaining properties in the translation Filter’s Property pane at their default values. The result is the assignment of our translation to the xslt Property.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-27.png

New Pipeline Translation association

For the moment, we will not associate a decoration filter.

Save the pipeline by clicking on its save.svg icon.

Manually load Raw Event test data

Having established the pipeline, we can now start authoring our text converter and translation. The first step is to load some Raw Event test data. Previously in the Event Log Source of this HOWTO you saved a copy of the file sampleApacheBlackBox.log to your local environment. It contains only a few events as the content is consistently formatted. We could feed the test data by posting the file to Stroom’s accounting/datafeed url, but for this example we will manually load the file. Once developed, raw data is posted to the web service.

Select the Feed.svg ApacheHHTPDFeed × tab and select the Data sub-tab to display

images/HOWTOs/v6/UI-ApacheHttpEventFeed-29.png

Data Loading

This window is divided into three panes.

The top pane displays the Stream Table, which is a table of the latest streams that belong to the feed (clearly it’s empty).

images/HOWTOs/v6/UI-ApacheHttpEventFeed-30.png

Data Loading - Stream Table

Note that a Raw Event stream is made up of data from a single file of data or aggregation of multiple data files and also meta-data associated with the data file(s). For example, file names, file size, etc.

The middle pane displays a Specific feed and any linked streams. To display a Specific feed, you select it from the Stream Table above.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-31.png

Data Loading - Specific Stream

The bottom pane displays the selected stream’s data or meta-data.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-32.png

Data Loading - Data/Metadata

Note the Upload icon upload.svg in the top left of the Stream table pane. On clicking the Upload icon, we are presented with the data Upload selection window.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-33.png

Data Loading - Upload Data

As stated earlier, raw event data is normally posted as a file to the Stroom web server. As part of this posting action, a set of well-defined HTTP extra headers are sent as part of the post. These headers, in the form of key value pairs, provide additional context associated with the system sending the logs. These standard headers become Stroom feed attributes available to the Stroom translation. Common attributes are

  • System - the name of the System providing the logs
  • Environment - the environment of the system (Production, Quality Assurance, Reference, Development)
  • Feed - the feedname itself
  • MyHost - the fully qualified domain name of the system sending the logs
  • MyIPaddress - the IP address of the system sending the logs
  • MyNameServer - the name server the system resolves names through

Since our translation will want these feed attributes, we will set them in the Meta Data text entry box of the Upload selection window. Note we can skip Feed as this will automatically be assigned correctly as part of the upload action (setting it to Apache-SSLBlackBox-V2.0-EVENTS obviously). Our Meta Data: will have

  • System:LinuxWebServer
  • Environment:Production
  • MyHost:stroomnode00.strmdev00.org
  • MyIPaddress:192.168.2.245
  • MyNameServer:192.168.2.254

We select a Stream Type: of Raw Events as this data is for an Event Feed. As this is not a Reference Feed we ignore the Effective: entry box (a date/time selector).

images/HOWTOs/v6/UI-ApacheHttpEventFeed-34.png

Upload Data

We now click the Choose File button, then navigate to the location of the raw log file you downloaded earlier, sampleApacheBlackBox.log

images/HOWTOs/v6/UI-ApacheHttpEventFeed-35.png

Upload Data

then click Open to return to the Upload selection window where we can then press Ok to perform the upload.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-36.png

Upload Data

An Alert dialog window is presented

images/HOWTOs/v6/UI-ApacheHttpEventFeed-37.png

Alert
which should be closed.

The stream we have just loaded will now be displayed in the Streams Table pane. Note that the Specific Stream and Data/Meta-data panes are still blank.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-38.png

Data Loading - Streams Table

If we select the stream by clicking anywhere along its line, the stream is highlighted and the Specific Stream and Data/Meta-data_ panes now display data.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-39.png

Data Loading - Streams Table

The Specific Stream pane only displays the Raw Event stream and the Data/Meta-data pane displays the content of the log file just uploaded (the Data link). If we were to click on the Meta link at the top of the Data/Meta-data pane, the log data is replaced by this stream’s meta-data.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-40.png

Data Loading - Meta-data

Note that, in addition to the feed attributes we set, the upload process added additional feed attributes of

  • Feed - the feed name
  • ReceivedTime - the time the feed was received by Stroom
  • RemoteFile - the name of the file loaded
  • StreamSize - the size, in bytes, of the loaded data within the stream
  • user-agent - the user agent used to present the stream to Stroom - in this case, the Stroom user Interface

We now have data that will allow us to develop our text converter and translation.

Step data through Pipeline - Source

We now need to step our data through the pipeline.

To do this, set the check-box on the Specific Stream pane and we note that the previously grayed out action icons ( process.svg delete.svg download.svg ) are now enabled.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-43.png

Select Stream to Step

We now want to step our data through the first element of the pipeline, the Text Converter. We enter Stepping Mode by pressing the stepping button stepping.svg found at the bottom right corner of the Data/Meta-data pane.

We will then be requested to choose a pipeline to step with, at which, you should navigate to the Apache-SSLBlackBox-V2.0-EVENTS pipeline as per

images/HOWTOs/v6/UI-ApacheHttpEventFeed-44.png

Select pipeline to Step

then press Ok .

At this point, we enter the pipeline Stepping tab

images/HOWTOs/v6/UI-ApacheHttpEventFeed-45.png

pipeline Stepping tab - Source

which, initially displays the Raw Event data from our stream. This is the Source display for the Event Pipeline.

Step data through Pipeline - Text Converter

We click on the text.svg DSParser element to enter the Text Converter stepping window.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-46.png

pipeline Stepping tab - Text Converter

This stepping tab is divided into three sub-panes. The top one is the Text Converter editor and it will allow you to edit the text conversion. The bottom left window displays the input to the Text Converter. The bottom right window displays the output from the Text Converter for the given input.

We also note an error indicator - that of an error in the editor pane as indicated by the black back-grounded x and rectangular black boxes to the right of the editor’s scroll bar.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-47.png

pipeline Stepping tab - Error

In essence, this means that we have no text converter to pass the Raw Event data through.

To correct this, we will author our text converter using the Data Splitter language. Normally this is done incrementally to more easily develop the parser. The minimum text converter contains

<?xml version="1.1" encoding="UTF-8"?>
<dataSplitter xmlns="data-splitter:3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="data-splitter:3 file://data-splitter-v3.0.1.xsd" version="3.0">
    <split  delimiter="\n">
        <group>
            <regex pattern="^(.*)$">
                <data name="rest" value="$1" />
            </regex>
        </group>
    </split>
</dataSplitter>

If we now press the Step First fast-backward-green.svg icon the error will disappear and the stepping window will show.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-48.png

pipeline Stepping tab - Text Converter Simple A

As we can see, the first line of our Raw Event is displayed in the input pane and the output window holds the converted XML output where we just have a single data element with a name attribute of rest and a value attribute of the complete raw event as our regular expression matched the entire line.

The next incremental step in the parser, would be to parse out additional data elements. For example, in this next iteration we extract the client ip address, the client port and hold the rest of the Event in the rest data element.

With the text converter containing

<?xml version="1.1" encoding="UTF-8"?>
<dataSplitter xmlns="data-splitter:3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="data-splitter:3 file://data-splitter-v3.0.1.xsd" version="3.0">
    <split  delimiter="\n">
        <group>
            <regex pattern="^([^/]+)/([^  ]+) (.*)$">
                <data name="clientip"  value="$1" />
                <data name="clientport"  value="$2" />
                <data name="rest" value="$3" />
            </regex>
        </group>
    </split>
</dataSplitter>

and a click on the Refresh Current Step refresh-green.svg icon we will see the output pane contain

images/HOWTOs/v6/UI-ApacheHttpEventFeed-49.png

Text Converter Simple B

We continue this incremental parsing until we have our complete parser.

The following is our complete Text Converter which generates xml records as defined by the Stroom records v3.0 schema.

<?xml version="1.1" encoding="UTF-8"?>
<dataSplitter 
    xmlns="data-splitter:3" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="data-splitter:3 file://data-splitter-v3.0.1.xsd" 
    version="3.0">

<!-- CLASSIFICATION: UNCLASSIFIED -->

<!-- Release History:
Release 20131001, 1 Oct 2013 - Initial release 

General Notes: 
This data splitter takes audit events for the Stroom variant of the Black Box Apache Auditing.

Event Format: The following is extracted from the Configuration settings for the Stroom variant of the Black Box Apache Auditing format.

#  Stroom - Black  Box  Auditing configuration
#
#  %a  - Client  IP address  (not  hostname (%h) to ensure ip address only)
#  When  logging the remote host,  it is important to log the client  IP address, not the
#  hostname. We do   this  with the '%a' directive.  Even  if HostnameLookups  are turned on,
#  using '%a' will  only record the IP address.  For the purposes of BlackBox formats,
#  reversed DNS should not  be trusted

#  %{REMOTE_PORT}e  - Client source port
#  Logging the client  source TCP  port  can provide some   useful  network data and can help
#  one associate a single client  with multiple requests.
#  If two   clients from the  same IP address  make   simultaneous connections, the 'common  log'
#  file format cannot distinguish  between those  clients. Otherwise, if  the client uses
#  keep-alives, then every hit  made   from a single  TCP  session will  be associated  by   the  same
#  client  port number.
#  The   port information can indicate  how  many   connections our server is  handling at  once,
#  which may  help in tuning server TCP/OP   settings. It will also identify which client ports
#  are legitimate requests if  the administrator is examining a possible  SYN-attack against  a
#  server.
#  Note we  are using the REMOTE_PORT  environment variable. Environment variables  only come
#  into play when   mod_cgi or  mod_cgid is  handling the request.

#  %X   - Connection status  (use %c  for  Apache 1.3)
#  The   connection status  directive  tells us detailed  information about the client  connection.
#  It returns  one of three flags:
#  x  if the client aborted the connection before completion,
#  +  if  the client has indicated that it will  use keep-alives (and request additional  URLS),
#  - if the connection will  be closed after  the event
#  Keep-Alive is a HTTP 1.1.  directive  that  informs a web  server that  a client  can request multiple
#  files during the  same connection.  This way  a client  doesn't need to go   through the  overhead
#  of re-establishing  a TCP  connection to retrieve  a new  file.

#  %t  - time - or  [%{%d/%b/%Y:%T}t.%{msec_frac}t %{%z}t] for  Apache 2.4
#  The   %t  directive  records the time that  the request started.
#  NOTE:  When  deployed on   an  Apache 2.4, or better,  environment, you   should use
#  strftime  format in  order  to  get  microsecond resolution.

#  %l  - remote logname
#

#  %u - username [in quotes]
#  The   remote user  (from auth;  This may  be bogus if the return status  (%s) is  401
#  for non-ssl services)
#  For SSL  services,  user names need to  be delivered  as DNs  to deliver PKI   user details
#  in full.  To  pass through PKI   certificate  properties in the correct form you   need to
#  add the following directives  to your  Apache configuration:
#  SSLUserName   SSL_CLIENT_S_DN
#  SSLOptions +StdEnvVars
#  If you   cannot,  then use %{SSL_CLIENT_S_DN}x   in place of %u and use  blackboxSSLUser
#  LogFormat nickname

#  %r  - first  line of text sent by   web  client [in quotes]
#  This is the first  line of text send by   the web  client, which includes the request
#  method, the  full URL,  and the  HTTP protocol.

#  %s  - status  code before any redirection
#  This is  the status  code of the original request.

#  %>s  - status  code after  any redirection  has taken place
#  This is  the final  status  code of the request, after  any internal  redirections  may
#  have taken  place.

#  %D   - time in  microseconds to handle the request
#  This is the  number of microseconds the  server  took to  handle the  request  in  microseconds

#  %I  - incoming bytes
#  This is  the bytes received, include request and headers. It  cannot, by   definition be zero.

#  %O   - outgoing bytes
#  This is  the size in bytes of the outgoing data,  including HTTP headers. It  cannot,  by
#  definition be zero.

#  %B  - outgoing content bytes
#  This is  the size in bytes of the outgoing data,  EXCLUDING  HTTP headers.  Unlike %b,   which
#  records '-' for zero bytes transferred,  %B  will record '0'.

#  %{Referer}i - Referrer HTTP Request  Header [in quotes]
#  This is  typically the URL of the page that  made   the request.  If  linked from
#  e-mail or direct  entry this  value will be empty. Note, this  can be spoofed
#  or turned off

#  %{User-Agent}i - User agent HTTP Request  Header [in quotes]
#  This is  the identifying information the client  (browser) reports about itself.
#  It can be spoofed or  turned  off
 
#  %V   - the server name   according to the UseCannonicalName setting
#  This identifies  the virtual  host in a multi host webservice

#  %p - the canonical port of the server servicing the request

#  Define a variation  of the Black Box  logs
#
#  Note, you   only need to  use the  'blackboxSSLUser' nickname if you cannot set  the
#  following directives  for any SSL  configurations
#  SSLUserName   SSL_CLIENT_S_DN
#  SSLOptions +StdEnvVars
#  You  will also note the variation for no   logio  module. The   logio  module supports
#  the %I  and %O   formatting directive
#

<IfModule mod_logio.c> 
LogFormat "%a/%{REMOTE_PORT}e %X %t %l \"%u\" \"%r\" %s/%>s %D I/%O/%B \"%{Referer}i\" \"%{User-Agent}i\" %V/%p" blackboxUser 
LogFormat "%a/%{REMOTE_PORT}e %X %t %l \"%{SSL_CLIENT_S_DN}x\" \"%r\" %s/%>s %D %I/%O/%B \"%{Referer}i\" \"%{User-Agent}i\" %V/%p" blackboxSSLUser 
</IfModule> 
<IfModule !mod_logio.c> 
LogFormat "%a/%{REMOTE_PORT}e %X %t %l \"%u\" \"%r\" %s/%>s %D 0/0/%B \"%{Referer}i\" \"%{User-Agent}i\" %V/$p" blackboxUser 
LogFormat "%a/%{REMOTE_PORT}e %X %t %l \"%{SSL_CLIENT_S_DN}x\" \"%r\" %s/%>s %D 0/0/%B \"%{Referer}i\" \"%{User-Agent}i\" %V/$p" blackboxSSLUser 
</IfModule> 
-->

<!--  Match line -->
<split  delimiter="\n">
    <group>
        <regex pattern="^([^/]+)/([^ ]+) ([^ ]+) \[([^\]]+)] ([^ ]+) &#34;([^&#34;]+)&#34; &#34;([^&#34;]+)&#34; (\d+)/(\d+) (\d+) ([^/]+)/([^/]+)/(\d+) &#34;([^&#34;]+)&#34; &#34;([^&#34;]+)&#34; ([^/]+)/([^ ]+)">
            <data name="clientip"  value="$1" />
            <data name="clientport"  value="$2" />
            <data name="constatus" value="$3" />
            <data  name="time" value="$4"  />
            <data  name="remotelname" value="$5"  />
            <data  name="user" value="$6" />
            <data  name="url" value="$7">
                <group value="$7" ignoreErrors="true">
                <!-- 
                Special case the "GET  /" url string as opposed to  the  more standard  "method url protocol/protocol_version".
                Also special  case a url  of "-"  which occurs  on   some   errors  (eg 408)
                -->
                    <regex pattern="^-$">
                        <data  name="url" value="error" />
                    </regex>
                    <regex pattern="^([^ ]+) (/)$">
                        <data  name="httpMethod" value="$1"  />
                        <data  name="url" value="$2" />
                    </regex>
                    <regex pattern="^([^ ]+) ([^  ]+) ([^ /]*)/([^  ]*)">
                        <data  name="httpMethod" value="$1"  />
                        <data  name="url" value="$2" />
                        <data  name="protocol" value="$3" />
                        <data  name="version" value="$4" />
                    </regex>
                </group>
            </data>
            <data  name="responseB" value="$8"  />
            <data  name="response" value="$9" />
            <data  name="timeM" value="$10" />
            <data  name="bytesIn" value="$11" />
            <data  name="bytesOut" value="$12"  />
            <data  name="bytesOutContent" value="$13" />
            <data name="referer"  value="$14" />
            <data  name="userAgent" value="$15"  />
            <data  name="vserver" value="$16" />
            <data name="vserverport"  value="$17" />
        </regex>
    </group>
</split>
</dataSplitter>


ApacheHTTPD BlackBox - Data Splitter ( Download ApacheHTTPDBlackBox-DataSplitter.txt )

If we now press the Step First fast-backward-green.svg icon we will see the complete parsed record

images/HOWTOs/v6/UI-ApacheHttpEventFeed-50.png

pipeline Stepping tab - Text Converter Complete

If we click on the Step Forward step-forward-green.svg icon we will see the next event displayed in both the input and output panes.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-51.png

pipeline Stepping tab - Text Converter Complete second event

we click on the Step Last fast-forward-green.svg icon we will see the last event displayed in both the input and output panes.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-52.png

pipeline Stepping tab - Text Converter Complete last event

You should take note of the stepping key that has been displayed in each stepping window. The stepping key are the numbers enclosed in square brackets e.g. [7556:1:16] found in the top right-hand side of the stepping window next to the stepping icons

images/HOWTOs/v6/UI-ApacheHttpEventFeed-53.png

pipeline Stepping tab - Stepping Key

The form of these keys is [ streamId ‘:’ subStreamId ‘:’ recordNo]

where

  • streamId - is the stream ID and won’t change when stepping through the selected stream.
  • subStreamId - is the sub stream ID. When Stroom processes event streams it aggregates multiple input files and this is the file number.
  • recordNo - is the record number within the sub stream.

One can double click on either the subStreamId or recordNo numbers and enter a new number. This allows you to ‘step’ around a stream rather than just relying on first, previous, next and last movement.

Note, you should now Save save.svg your edited Text Converter.

Step data through Pipeline - Translation

To start authoring the xslt Translation Filter, press the xslt.svg translationFilter element which steps us to the xsl Translation Filter pane.

images/HOWTOs/v6/UI-ApacheHttpEventFeed-54.png

pipeline Stepping tab - Translation Initial

As for the Text Converter stepping tab, this tab is divided into three sub-panes. The top one is the xslt translation editor and it will allow you to edit the xslt translation. The bottom left window displays the input to the xslt translation (which is the output from the Text Converter). The bottom right window displays the output from the xslt Translation filter for the given input.

We now click on the pipeline Step Forward button step-forward-green.svg to single step the Text Converter records element data through our xslt Translation. We see no change as an empty translation will just perform a copy of the input data.

To correct this, we will author our xslt translation. Like the Data Splitter this is also authored incrementally. A minimum xslt translation might contain

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet 
    xpath-default-namespace="records:2" 
    xmlns="event-logging:3" 
    xmlns:stroom="stroom" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    version="3.0">

  <!-- Ingest the records tree -->
  <xsl:template match="records">
    <Events xsi:schemaLocation="event-logging:3 file://event-logging-v3.2.3.xsd" Version="3.2.3">
        <xsl:apply-templates />
    </Events>
  </xsl:template>

    <!-- Only generate events if we have an url on input -->
    <xsl:template match="record[data[@name = 'url']]">
        <Event>
            <xsl:apply-templates select="." mode="eventTime" />
            <xsl:apply-templates select="." mode="eventSource" />
            <xsl:apply-templates select="." mode="eventDetail" />
        </Event>
    </xsl:template>

    <xsl:template match="node()"  mode="eventTime">
        <EventTime>
            <TimeCreated/>
        </EventTime>
    </xsl:template>

    <xsl:template match="node()"  mode="eventSource">
        <EventSource>
            <System>
                <Name  />
                <Environment />
            </System>
            <Generator />
            <Device />
            <Client />
            <Server />
            <User>
                <Id />
            </User>
        </EventSource>
    </xsl:template>

    <xsl:template match="node()"  mode="eventDetail">
        <EventDetail>
            <TypeId>SendToWebService</TypeId>
            <Description />
            <Classification />
            <Send />
        </EventDetail>
    </xsl:template>
</xsl:stylesheet>
images/HOWTOs/v6/UI-ApacheHttpEventFeed-55.png

Translation Minimal

Clearly this doesn’t generate useful events. Our first iterative change might be to generate the TimeCreated element value. The change would be

    <xsl:template match="node()" mode="eventTime">
        <EventTime>
          <TimeCreated>
             <xsl:value-of select="stroom:format-date(data[@name = 'time']/@value, 'dd/MMM/yyyy:HH:mm:ss XX')" /> 
          </TimeCreated>
        </EventTime>
    </xsl:template>
images/HOWTOs/v6/UI-ApacheHttpEventFeed-56.png

Translation Minimal+

Adding in the EventSource elements (without ANY error checking!) as per

    <xsl:template match="node()"  mode="eventSource">
        <EventSource>
            <System>
              <Name>
                <xsl:value-of select="stroom:feed-attribute('System')"  />
              </Name>
              <Environment>
                <xsl:value-of select="stroom:feed-attribute('Environment')"  />
              </Environment>
            </System>
            <Generator>Apache  HTTPD</Generator>
            <Device>
              <HostName>
                <xsl:value-of select="stroom:feed-attribute('MyHost')"  />
              </HostName>
              <IPAddress>
                <xsl:value-of select="stroom:feed-attribute('MyIPAddress')"  />
              </IPAddress>
            </Device>
            <Client>
              <IPAddress>
                <xsl:value-of select="data[@name =  'clientip']/@value"  />
              </IPAddress>
              <Port>
                <xsl:value-of select="data[@name =  'clientport']/@value"  />
              </Port>
            </Client>
            <Server>
              <HostName>
                <xsl:value-of select="data[@name =  'vserver']/@value"  />
              </HostName>
              <Port>
                <xsl:value-of select="data[@name =  'vserverport']/@value"  />
              </Port>
            </Server>
            <User>
              <Id>
                <xsl:value-of select="data[@name='user']/@value" />
              </Id>
            </User>
        </EventSource>
    </xsl:template>

And after a Refresh Current Step refresh-green.svg we see our output event ‘grow’ to

images/HOWTOs/v6/UI-ApacheHttpEventFeed-57.png

Translation Minimal++

We now complete our translation by expanding the EventDetail elements to have the completed translation of (again with limited error checking and non-existent documentation!)

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet 
    xpath-default-namespace="records:2" 
    xmlns="event-logging:3" 
    xmlns:stroom="stroom" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    version="3.0">

  <!-- Ingest the records tree -->
  <xsl:template match="records">
    <Events xsi:schemaLocation="event-logging:3 file://event-logging-v3.2.3.xsd" Version="3.2.3">
        <xsl:apply-templates />
    </Events>
  </xsl:template>

    <!-- Only generate events if we have an url on input -->
    <xsl:template match="record[data[@name = 'url']]">
        <Event>
            <xsl:apply-templates select="." mode="eventTime" />
            <xsl:apply-templates select="." mode="eventSource" />
            <xsl:apply-templates select="." mode="eventDetail" />
        </Event>
    </xsl:template>

    <xsl:template match="node()" mode="eventTime">
        <EventTime>
          <TimeCreated>
             <xsl:value-of select="stroom:format-date(data[@name = 'time']/@value, 'dd/MMM/yyyy:HH:mm:ss XX')" /> 
          </TimeCreated>
        </EventTime>
    </xsl:template>

    <xsl:template match="node()"  mode="eventSource">
        <EventSource>
            <System>
              <Name>
                <xsl:value-of select="stroom:feed-attribute('System')"  />
              </Name>
              <Environment>
                <xsl:value-of select="stroom:feed-attribute('Environment')"  />
              </Environment>
            </System>
            <Generator>Apache  HTTPD</Generator>
            <Device>
              <HostName>
                <xsl:value-of select="stroom:feed-attribute('MyHost')"  />
              </HostName>
              <IPAddress>
                <xsl:value-of select="stroom:feed-attribute('MyIPAddress')"  />
              </IPAddress>
            </Device>
            <Client>
              <IPAddress>
                <xsl:value-of select="data[@name =  'clientip']/@value"  />
              </IPAddress>
              <Port>
                <xsl:value-of select="data[@name =  'clientport']/@value"  />
              </Port>
            </Client>
            <Server>
              <HostName>
                <xsl:value-of select="data[@name =  'vserver']/@value"  />
              </HostName>
              <Port>
                <xsl:value-of select="data[@name =  'vserverport']/@value"  />
              </Port>
            </Server>
            <User>
              <Id>
                <xsl:value-of select="data[@name='user']/@value" />
              </Id>
            </User>
        </EventSource>
    </xsl:template>


    <!-- -->
    <xsl:template match="node()"  mode="eventDetail">
        <EventDetail>
          <TypeId>SendToWebService</TypeId>
          <Description>Send/Access data to Web Service</Description>
          <Classification>
            <Text>UNCLASSIFIED</Text>
          </Classification>
          <Send>
            <Source>
              <Device>
                <IPAddress>
                    <xsl:value-of select="data[@name = 'clientip']/@value"/>
                </IPAddress>
                <Port>
                    <xsl:value-of select="data[@name = 'vserverport']/@value"/>
                </Port>
              </Device>
            </Source>
            <Destination>
              <Device>
                <HostName>
                    <xsl:value-of select="data[@name = 'vserver']/@value"/>
                </HostName>
                <Port>
                    <xsl:value-of select="data[@name = 'vserverport']/@value"/>
                </Port>
              </Device>
            </Destination>
            <Payload>
              <Resource>
                <URL>
                    <xsl:value-of select="data[@name = 'url']/@value"/>
                </URL>
                <Referrer>
                    <xsl:value-of select="data[@name = 'referer']/@value"/>
                </Referrer>
                <HTTPMethod>
                    <xsl:value-of select="data[@name = 'url']/data[@name = 'httpMethod']/@value"/>
                </HTTPMethod>
                <HTTPVersion>
                    <xsl:value-of select="data[@name = 'url']/data[@name = 'version']/@value"/>
                </HTTPVersion>
                <UserAgent>
                    <xsl:value-of select="data[@name = 'userAgent']/@value"/>
                </UserAgent>
                <InboundSize>
                    <xsl:value-of select="data[@name = 'bytesIn']/@value"/>
                </InboundSize>
                <OutboundSize>
                    <xsl:value-of select="data[@name = 'bytesOut']/@value"/>
                </OutboundSize>
                <OutboundContentSize>
                    <xsl:value-of select="data[@name = 'bytesOutContent']/@value"/>
                </OutboundContentSize>
                <RequestTime>
                    <xsl:value-of select="data[@name = 'timeM']/@value"/>
                </RequestTime>
                <ConnectionStatus>
                    <xsl:value-of select="data[@name = 'constatus']/@value"/>
                </ConnectionStatus>
                <InitialResponseCode>
                    <xsl:value-of select="data[@name = 'responseB']/@value"/>
                </InitialResponseCode>
                <ResponseCode>
                    <xsl:value-of select="data[@name = 'response']/@value"/>
                </ResponseCode>
                <Data Name="Protocol">
                  <xsl:attribute select="data[@name = 'url']/data[@name = 'protocol']/@value" name="Value"/>
                </Data>
              </Resource>
            </Payload>
            <!-- Normally our translation at this point would contain an <Outcome> attribute.
            Since all our sample data includes only successful outcomes we have ommitted the <Outcome> attribute 
            in the translation to minimise complexity-->
          </Send>
        </EventDetail>
    </xsl:template>
</xsl:stylesheet>


Apache BlackBox Translation XSLT ( Download ApacheHTTPDBlackBox-TranslationXSLT.txt )

And after a Refresh Current Step refresh-green.svg we see the completed <EventDetail> section of our output event

images/HOWTOs/v6/UI-ApacheHttpEventFeed-58.png

Translation Complete

Note, you should now Save save.svg your edited xslt Translation.

We have completed the translation and have completed developing our Apache-SSLBlackBox-V2.0-EVENTS event feed.

At this point, this event feed is set up to accept Raw Event data, but it will not automatically process the raw data and hence it will not place events into the Event Store. To have Stroom automatically process Raw Event streams, you will need to enable Processors for this pipeline.

3 - Event Processing

This HOWTO is provided to assist users in setting up Stroom to process inbound raw event logs and transform them into the Stroom Event Logging XML Schema.

Introduction

This HOWTO is provided to assist users in setting up Stroom to process inbound raw event logs and transform them into the Stroom Event Logging XML Schema.

This HOWTO will demonstrate the process by which an Event Processing pipeline for a given Event Source is developed and deployed.

The sample event source used will be based on BlueCoat Proxy logs. An extract of BlueCoat logs were sourced from log-sharing.dreamhosters.com (a Public Security Log Sharing Site) but modified to add sample user attribution.

Template pipelines are being used to simplify the establishment of this processing pipeline.

The sample BlueCoat Proxy log will be transformed into an intermediate simple XML key value pair structure, then into the Stroom Event Logging XML Schema format.

Assumptions

The following assumptions are used in this document.

  1. The user successfully deployed Stroom
  2. The following Stroom content packages have been installed:
    • Template Pipelines
    • XML Schemas

Event Source

As mentioned, we will use BlueCoat Proxy logs as a sample event source. Although BlueCoat logs can be customised, the default is to use the W2C Extended Log File Format (ELF). Our sample data set looks like

#Software: SGOS 3.2.4.28
#Version: 1.0
#Date: 2005-04-27 20:57:09
#Fields: date time time-taken c-ip sc-status s-action sc-bytes cs-bytes cs-method cs-uri-scheme cs-host cs-uri-path cs-uri-query cs-username s-hierarchy s-supplier-name rs(Content-Type) cs(User-Agent) sc-filter-result sc-filter-category x-virus-id s-ip s-sitename x-virus-details x-icap-error-code x-icap-error-details
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 2 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 2 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 51 45.14.4.127 200 TCP_NC_MISS 926 1104 GET http images.google.com /imgres ?imgurl=http://www.bettercomponents.be/images/linux-logo.gif&imgrefurl=http://www.bettercomponents.be/index.php%253FcPath%253D96&h=360&w=327&sz=132&tbnid=UKfPlBMXgToJ:&tbnh=117&tbnw=106&hl=en&prev=/images%253Fq%253Dlinux%252Blogo%2526hl%253Den%2526lr%253D&frame=small sally DIRECT images.google.com text/html "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/312.1 (KHTML, like Gecko) Safari/312" PROXIED Hacking/Proxy%20Avoidance - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 2 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 2 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 2 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 98 45.14.3.52 200 TCP_HIT 14258 321 GET http www.cedardalechurch.ca /birdscp2.gif - brad DIRECT 209.135.103.13 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 2 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 2717 45.110.2.82 200 TCP_NC_MISS 3926 1051 GET http www.inmobus.com /wcm/isocket/iSocket.cfm ?requestURL=http://www.inmobus.com/wcm/html/../isocket/image_manager_search.cfm?dsn=InmobusWCM&projectid=26&SetModule=WCM&iSocketAction=response&responseContainer=leftTopDiv george DIRECT www.inmobus.com text/html;%20charset=UTF-8 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 47 45.14.4.127 200 TCP_NC_MISS 2620 926 GET http images.google.com /images ?q=tbn:UKfPlBMXgToJ:http://www.bettercomponents.be/images/linux-logo.gif jane DIRECT images.google.com image/jpeg "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/312.1 (KHTML, like Gecko) Safari/312" PROXIED Hacking/Proxy%20Avoidance - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:12 1 45.110.2.82 200 TCP_HIT 941 729 GET http www.inmobus.com /wcm/assets/images/imagefileicon.gif - george DIRECT 38.112.92.20 image/gif "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" PROXIED none - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:13 139 45.112.2.73 207 TCP_NC_MISS 819 418 PROPFIND http idisk.mac.com /patrickarnold/Public/Show - bill DIRECT idisk.mac.com text/xml;charset=utf-8 "WebDAVFS/1.2.7 (01278000) Darwin/7.8.0 (Power Macintosh)" PROXIED Computers/Internet - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:13 2 45.106.2.66 200 TCP_HIT 559 348 GET http aim-charts.pf.aol.com / ?action=aim&fields=snpghlocvAa&syms=INDEX:COMPX,INDEX:INDU,INDEX:INX,TWX sally DIRECT 205.188.136.217 text/plain "AIM/30 (Mozilla 1.24b; Windows; I; 32-bit)" PROXIED Web%20Communications - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:13 9638 45.106.3.71 200 TCP_NC_MISS 46052 1921 POST http home.silverstar.com /cgi-bin/mailman.cgi - carol DIRECT home.silverstar.com text/html "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2" PROXIED Computers/Internet - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:16:13 173 45.112.2.73 207 TCP_NC_MISS 647 436 PROPFIND http idisk.mac.com /patrickarnold/Public/Show/nuvio_05_what.swf - bill DIRECT idisk.mac.com text/xml;charset=utf-8 "WebDAVFS/1.2.7 (01278000) Darwin/7.8.0 (Power Macintosh)" PROXIED Computers/Internet - 192.16.170.42 SG-HTTP-Service - none -
2005-05-04 17:17:26 495 45.108.2.100 401 TCP_NC_MISS 1007 99884 PUT http idisk.mac.com /fayray_account_transfer_holding_area_for_pictures_to_homepage_temporary/Documents/85bT9bmviawEbbBb4Sie/Image-2743371ABCC011D9.jpg - - DIRECT idisk.mac.com text/html;charset=iso-8859-1 "DotMacKit/1.1 (10.4.0; iPho)" PROXIED Computers/Internet - 192.16.170.42 SG-HTTP-Service - none -


Sample BlueCoat logs ( Download sampleBluecoat.log )

Later in this HOWTO, one will be required to upload this file. If you save this file now, ensure the file is saved as a text document with ANSI encoding.

Establish the Processing Pipeline

We will create the components that make up the processing pipeline for transforming these raw logs into the Stroom Event Logging XML Schema. They will be placed a folder appropriately named BlueCoat in the path System/Event Sources/Proxy. See Folder Creation for details on creating such a folder.

There will be four components

  • the Event Feed to group the BlueCoat log files
  • the Text Converter to convert the BlueCoat raw logs files into simple XML
  • the XSLT Translation to translate the simple XML formed by the Text Converter into the Stroom Event Logging XML form, and
  • the Processing pipeline which manages how the processing is performed.

All components will have the same Name BlueCoat-Proxy-V1.0-EVENTS. It should be noted that the default Stroom FeedName pattern will not accept this name. One needs to modify the stroom.feedNamePattern stroom property to change the default pattern to ^[a-zA-Z0-9_-\.]{3,}$. See the HOWTO on System Properties docment to see how to make this change.

Create the Event Feed

We first select (with a left click) the System/Event Sources/Proxy/BlueCoat folder in the Explorer tab then right click and select:

add.svg
New
document/Feed.svg
Feed

This will open the New Feed configuration window into which we enter BlueCoat-Proxy-V1.0-EVENTS into the Name: entry box

images/HOWTOs/UI-FeedProcessing-00.png

Stroom UI Create Feed - New feed configuration window enter name

and press Ok to see the new Event Feed tab

images/HOWTOs/UI-FeedProcessing-01.png

Stroom UI Create Feed - New feed tab

and it’s corresponding reference in the Explorer display.

The configuration items for a Event Feed are

  • Description - a description of the feed
  • Classification - the classification or sensitivity of the Event Feed data
  • Reference Feed Flag - to indicate if this is a Reference Feed or not
  • Feed Status - which indicates if we accept data, reject it or silently drop it
  • Stream Type - to indicate if the Feed contains raw log data or reference data
  • Data Encoding - the character encoding of the data being sent to the Feed
  • Context Encoding - the character encoding of context data associated with this Feed
  • Retention Period - the amount of time to retain the Event data

In our example, we will set the above to

  • Description - BlueCoat Proxy log data sent in W2C Extended Log File Format (ELFF)
  • Classification - We will leave this blank
  • Reference Feed Flag - We leave the check-box unchecked as this is not a Reference Feed
  • Feed Status - We set to Receive
  • Stream Type - We set to Raw Events as we will be sending batches (streams) of raw event logs
  • Data Encoding - We leave at the default of UTF-8 as this is the proposed character encoding
  • Context Encoding - We leave at the default of UTF-8 as there are no Context Events for this Feed
  • Retention Period - We leave at Forever was we do not want to delete any collected BlueCoat event data.
images/HOWTOs/UI-FeedProcessing-02.png

Stroom UI Create Feed - New feed tab configuration

One should note that the Feed tab Feed.svg * BlueCoat-Proxy-V1.0-EVENTS × has been marked as having unsaved changes. This is indicated by the asterisk character * between the Feed icon document/Feed.svg and the name of the feed BlueCoat-Proxy-V1.0-EVENTS.

We can save the changes to our feed by pressing the Save icon save.svg in the top left of the BlueCoat-Proxy-V1.0-EVENTS tab. At this point one should notice two things, the first is that the asterisk has disappeared from the Feed tab and the the second is that the Save icon save.svg is now disabled.

images/HOWTOs/UI-FeedProcessing-03.png

Stroom UI Create Feed - New feed tab saved

Create the Text Converter

We now create the Text Converter for this Feed in a similar fashion to the Event Feed. We first select (with a left click) the System/Event Sources/Proxy/BlueCoat folder in the Explorer tab then right click and select

add.svg
New
document/TextConverter.svg
Text Converter

Enter BlueCoat-Proxy-V1.0-EVENTS into the Name: entry box and press the Ok which results in the creation of the Text Converter tab

images/HOWTOs/UI-FeedProcessing-04.png

Stroom UI Create Feed - New TextConverter tab

and it’s corresponding reference in the Explorer display.

We set the configuration for this Text Converter to be

  • Description - Simple XML transform for BlueCoat Proxy log data sent in W2C Extended Log File Format (ELFF)
  • Converter Type - We set to Data Splitter was we will be using the Stroom Data Splitter facility to convert the raw log data into simple XML.

Again, press the Save icon save.svg to save the configuration items.

Create the XSLT Translation

We now create the XSLT translation for this Feed in a similar fashion to the Event Feed or Text Converter. We first select (with a left click) the System/Event Sources/Proxy/BlueCoat folder in the Explorer tab then right click and select:

add.svg
New
document/XSLT.svg
XSL Translation

Enter BlueCoat-Proxy-V1.0-EVENTS into the Name: entry box and press the Ok which results in the creation of the XSLT Translation tab

images/HOWTOs/UI-FeedProcessing-05.png

Stroom UI Create Feed - New Translation tab

and it’s corresponding reference in the Explorer display.

We set the configuration for this XSLT Translation to be

  • Description - Transform simple XML of BlueCoat Proxy log data into Stroom Event Logging XML form

Again, press the Save icon save.svg to save the configuration items.

Create the Pipeline

We now create the Pipeline for this Feed in a similar fashion to the Event Feed, Text Converter or XSLT Translation. We first select (with a left click) the System/Event Sources/Proxy/BlueCoat folder in the Explorer tab then right click and select:

add.svg
New
document/Pipeline.svg
Pipeline

Enter BlueCoat-Proxy-V1.0-EVENTS into the Name: entry box and press the Ok which results in the creation of the Pipeline tab

images/HOWTOs/UI-FeedProcessing-06.png

Stroom UI Create Feed - New Pipeline tab

and it’s corresponding reference in the Explorer display.

We set the configuration for this Pipeline to be

  • Description - Processing of XML of BlueCoat Proxy log data into Stroom Event Logging XML
  • Type - We leave as Event Data as this is an Event Data pipeline

Configure Pipeline Structure

We now need to configure the Structure of this Pipeline.

We do this by selecting the Structure hyper-link of the *BlueCoat-Proxy-V1.0-EVENTS Pipeline tab.

At this we see the Pipeline Structure configuration tab

images/HOWTOs/UI-FeedProcessing-07.png

Stroom UI Create Feed - Pipeline Structure tab

As noted in the Assumptions at the start, we have loaded the Template Pipeline content pack, so that we can Inherit a pipeline structure from this content pack and configure it to support this specific feed.

We find a template by selecting the Inherit From: None popup.png entry box to reveal a Choose Item configuration item window.

images/HOWTOs/UI-FeedProcessing-08.png

Stroom UI Create Feed - Pipeline Structure tab - Inherit

Select the Template Pipelines folder by pressing the tree-closed.svg icon to the left of the folder to reveal the choice of available templates.

images/HOWTOs/UI-FeedProcessing-09.png

Stroom UI Create Feed - Pipeline Structure tab - Templates

For our BlueCoat feed we will select the Event Data (Text) template. This is done by moving the cursor to the relevant line and select via a left click

images/HOWTOs/UI-FeedProcessing-10.png

Stroom UI Create Feed - Pipeline Structure tab - Template Selection

then pressing Ok to see the inherited pipeline structure

images/HOWTOs/UI-FeedProcessing-11.png

Stroom UI Create Feed - Pipeline Structure tab - Template Selected

Configure Pipeline Elements

For the purpose of this HOWTO, we are only interested in two of the eleven (11) elements in this pipeline

  • the Text Converter labeled dsParser
  • the XSLT Translation labeled translationFilter

We need to assign our BlueCoat-Proxy-V1.0-EVENTS Text Converter and XSLT Translation to these elements respectively.

Text Converter Configuration

We do this by first selecting (left click) the dsParser element at which we see the Property sub-window displayed

images/HOWTOs/UI-FeedProcessing-12.png

Stroom UI Create Feed - Pipeline Structure tab - dsParser

We then select (left click) the textConverter Property Name

images/HOWTOs/UI-FeedProcessing-13.png

Stroom UI Create Feed - Pipeline Structure tab - dsParser selected Property

then press the Edit Property button edit.svg . At this, the Edit Property configuration window is displayed.

images/HOWTOs/UI-FeedProcessing-14.png

Stroom UI Create Feed - Pipeline Structure tab - dsParser Edit Property

We select the Value: None popup.png entry box labeled to reveal a Choose Item configuration item window.

images/HOWTOs/UI-FeedProcessing-15.png

Stroom UI Create Feed - Pipeline Structure tab - dsParser Edit Property choose item

We traverse the folder structure until we can select the BlueCoat-Proxy-V1.0-EVENTS Text Converter as per

images/HOWTOs/UI-FeedProcessing-16.png

Stroom UI Create Feed - Pipeline Structure tab - dsParser Edit Property chosen item

and then press the Ok to see that the Property Value: has been selected.

images/HOWTOs/UI-FeedProcessing-17.png

Stroom UI Create Feed - Pipeline Structure tab - dsParser set Property chosen item

and pressing the Ok button of the Edit Property configuration window results in the pipelines dsParser property being set.

images/HOWTOs/UI-FeedProcessing-18.png

Stroom UI Create Feed - Pipeline Structure tab - dsParser set Property

XSLT Translation Configuration

We do this by first selecting (left click) the translationFilter element at which we see the Property sub-window displayed

images/HOWTOs/UI-FeedProcessing-19.png

Stroom UI Create Feed - Pipeline Structure tab - translationFilter

We then select (left click) the xslt Property Name

images/HOWTOs/UI-FeedProcessing-20.png

Stroom UI Create Feed - Pipeline Structure tab - xslt selected Property

and following the same steps as for the Text Converter property selection, we assign the BlueCoat-Proxy-V1.0-EVENTS XSLT Translation to the xslt property.

images/HOWTOs/UI-FeedProcessing-21.png

Stroom UI Create Feed - Pipeline Structure tab - xslt selected Property

At this point, we save these changes by pressing the Save icon save.svg .

Authoring the Translation

We are now ready to author the translation. Close all tabs except for the Welcome and BlueCoat-Proxy-V1.0-EVENTS Feed tabs.

On the BlueCoat-Proxy-V1.0-EVENTS Feed tab, select the Data hyper-link to be presented with the Data pane of our tab.

images/HOWTOs/UI-FeedProcessing-22.png

Stroom UI Create Feed - Translation - Data Pane

Although we can post our test data set to this feed, we will manually upload it via the Data pane. To do this we press the Upload button upload.svg in the top Data pane to display the Upload configuration window

images/HOWTOs/UI-FeedProcessing-23.png

Stroom UI Create Feed - Translation - Data Pane Upload

In a Production situation, where we would post log files to Stroom, we would include certain HTTP Header variables that, as we shall see, will be used as part of the translation. These header variables typically provide situational awareness of the source system sending the events.

For our purposes we set the following HTTP Header variables

Environment:Development
LogFileName:sampleBluecoat.log
MyHost:"somenode.strmdev00.org"
MyIPaddress:"192.168.2.220 192.168.122.1"
MyMeta:"FQDN:somenode.strmdev00.org\nipaddress:192.168.2.220\nipaddress_eth0:192.168.2.220\nipaddress_lo:127.0.0.1\nipaddress_virbr0:192.168.122.1\n"
MyNameServer:"gateway.strmdev00.org."
MyTZ:+1000
Shar256:056f0d196ffb4bc6c5f3898962f1708886bb48e2f20a81fb93f561f4d16cb2aa
System:Site http://log-sharing.dreamhosters.com/ Bluecoat Logs
Version:V1.0

These are set by entering them into the Meta Data: entry box.

images/HOWTOs/UI-FeedProcessing-24b.png

Stroom UI Create Feed - Translation - Data Pane Upload Metadata

Having done this we select a Stream Type: of Raw Events

We leave the Effective: entry box empty as this stream of raw event logs does not have an Effective Date (only Reference Feeds set this).

And we choose our file sampleBluecoat.log, by clicking on the Browse button in the File: entry box, which brings up the brower’s standard file upload selection window. Having selected our file, we see

images/HOWTOs/UI-FeedProcessing-24.png

Stroom UI Create Feed - Translation - Data Pane Upload Complete

On pressing Ok and Alert pop-up window is presented indicating the file was uploaded

images/HOWTOs/UI-FeedProcessing-25.png

Stroom UI Create Feed - Translation - Data Pane Upload Complete Verify

Again press Close to show that the data has been uploaded as a Stream into the BlueCoat-Proxy-V1.0-EVENTS Event Feed.

images/HOWTOs/UI-FeedProcessing-26.png

Stroom UI Create Feed - Translation - Data Pane Show Batch

The top pane holds a table of the latest streams that pertain to the feed. We see the one item which is the stream we uploaded. If we select it, we see that a stream summary is also displayed in the centre pane (which shows details of the specific selected feed and associated streams. We also see that the bottom pane displays the data associated with the selected item. In this case, the first lines of content from the BlueCoat sample log file.

images/HOWTOs/UI-FeedProcessing-27.png

Stroom UI Create Feed - Translation - Data Pane Show Data

If we were to select the Meta hyper-link of the lower pane, one would see the metadata Stroom records for this Stream of data.

images/HOWTOs/UI-FeedProcessing-28.png

Stroom UI Create Feed - Translation - MetaData Pane Show Data

You should see all the HTTP variables we set as part of the Upload step as well as some that Stroom has automatically set.

We now switch back to the Data hyper-link before we start to develop the actual translation.

Stepping the Pipeline

We will now author the two translation components of the pipeline, the data splitter that will transform our lines of BlueCoat data into a simple xml format and then the XSLT translation that will take this simple xml format and translate it into appropriate Stroom Event Logging XML form.

We start by ensuring our Raw Events Data stream is selected and we press the Enter Stepping Mode stepping.svg button on the lower right hand side of the bottom Stream Data pane.

You will be prompted to select a pipeline to step with. Choose the BlueCoat-Proxy-V1.0-EVENTS pipeline

images/HOWTOs/UI-FeedProcessing-29.png

Stroom UI Create Feed - Translation - Stepping Choose Pipeline

then press Ok .

Stepping the Pipeline - Source

You will be presented with the Source element of the pipeline that shows our selected stream’s raw data.

images/HOWTOs/UI-FeedProcessing-30.png

Stroom UI Create Feed - Translation - Stepping Source Element

We see two panes here.

The top pane displays the Pipeline structure with Source selected (we could refer to this as the stepping pane) and it also displays a step indicator (three colon separated numbers enclosed in square brackets initially the numbers are dashes i.e. [-:-:-] as we have yet to step) and a set of green Stepping Actions. The step indicator and Stepping Actions allows one the step through a log file, selecting data event by event (an event is typically a line, but some events can be multi-line).

The bottom pane displays the first page (up to 100 lines) of data along with a set of blue Data Selection Actions. The Data Selection Actions are used to step through the source data 100 lines at a time. When multiple source log files have been aggregated into a single stream, two Data Selection Actions control buttons will be offered. The right hand one will allow a user to step though the source data as before, but the left hand set of control buttons allows one to step between files from the aggregated event log files.

Stepping the Pipeline - dsParser

We now select the dsParser pipeline element that results in the window below

images/HOWTOs/UI-FeedProcessing-31.png

Stroom UI Create Feed - Translation - Stepping dsParser Element

This window is made up of four panes.

The top pane remains the same - a display of the pipeline structure and the step indicator and green Stepping Actions.

The next pane down is the editing pane for the Text Converter. This pane is used to edit the text converter that converts our line based BlueCoat Proxy logs into a XML format. We make use of the Stroom Data Splitter facility to perform this transformation. See here for complete details on the data splitter.

The lower two panes are the input and output displays for the text converter.

The authoring of this data splitter translation is outside the scope of this HOWTO. It is recommended that one reads up on the Data Splitter and review the various samples found in the Stroom Context packs published, or the Pull Requests of github.com/gchq/stroom-content .

For the purpose of this HOWTO, the Datasplitter appears below. The author believes the comments should support the understanding of the transformation.

<?xml version="1.0" encoding="UTF-8"?>
<dataSplitter 
    bufferSize="5000000" 
    xmlns="data-splitter:3" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="data-splitter:3 file://data-splitter-v3.0.xsd" 
    version="3.0" 
    ignoreErrors="true">

  <!-- 
  This datasplitter gains the Software and and Proxy version strings along with the log field names from the comments section of the log file.
  That is from the lines ...
  
  #Software: SGOS 3.2.4.28
  #Version: 1.0
  #Date: 2005-04-27 20:57:09
  #Fields: date time time-taken c-ip sc-status s-action sc-bytes cs-bytes cs-method ... x-icap-error-code x-icap-error-details
  
  We use the Field values as the header for the subsequent log fields
  -->
  
  <!-- Match the software comment line and save it in _bc_software -->
  <regex id="software" pattern="^#Software: (.+) ?\n*">
    <data name="_bc_software" value="$1" />
  </regex>
    <!-- Match the version comment line and save it in _bc_version -->

  <regex id="version" pattern="^#Version: (.+) ?\n*">
    <data name="_bc_version" value="$1" />
  </regex>

  <!-- Match against a Fields: header comment and save all the field names in a headings -->
  
  <regex id="heading" pattern="^#Fields: (.+) ?\n*">
    <group value="$1">
      <regex pattern="^(\S+) ?\n*">
        <var id="headings" />
      </regex>
    </group>
  </regex>

  <!-- Skip all other comment lines -->
  <regex pattern="^#.+\n*">
    <var id="ignorea" />
  </regex>

  <!-- We now match all other lines, applying the headings captured at the start of the file to each field value -->
  
  <regex id="body" pattern="^[^#].+\n*">
    <group>
      <regex pattern="^&#34;([^&#34;]*)&#34; ?\n*">
        <data name="$headings$1" value="$1" />
      </regex>
      <regex pattern="^([^ ]+) *\n*">
        <data name="$headings$1" value="$1" />
      </regex>
    </group>
  </regex>

  <!-- -->
</dataSplitter>


BlueCoat dataspliter ( Download BlueCoat.ds )

It should be entered into the Text Converter’s editing pane as per

images/HOWTOs/UI-FeedProcessing-32.png

Stroom UI Create Feed - Translation - Stepping dsParser textConverter code

As mentioned earlier, to step the translation, one uses the green Stepping Actions.

The actions are

  • fast-backward-green.svg - progress the transformation to the first line of the translation input
  • step-backward-green.svg - progress the transformation one step backward
  • step-forward-green.svg - progress the transformation one step forward
  • fast-forward-green.svg - progress the transformation to the end of the translation input
  • refresh-green.svg - refresh the transformation based on the current translation input

So, if one was to press the step-forward-green.svg stepping action we would be presented with

images/HOWTOs/UI-FeedProcessing-33.png

Stroom UI Create Feed - Translation - Stepping dsParser textConverter 1

We see that the input pane has the first line of input from our sample file and the output pane has an XML record structure where we have defined a data element with the name attribute of bc_software and it’s value attribute of SGOS 3.2.4.28. The definition of the record structure can be found in the System/XML Schemas/records folder.

This is the result of the code in our editor

<!-- Match the software comment line and save it in _bc_software -->
<regex id="software" pattern="^#Software: (.+) ?\n*">
  <data name="_bc_software" value="$1" />
</regex>

If one presses the step-forward-green.svg stepping action again, we see that we have moved to the second line of the input file with the resultant output of a data element with the name attribute of bc_version and it’s value attribute of 1.0.

images/HOWTOs/UI-FeedProcessing-34.png

Stroom UI Create Feed - Translation - Stepping dsParser textConverter 2

Stepping forward once more causes the translation to ignore the Date comment line, define a Data Splitter $headings variable from the Fields comment line and transform the first line of actual event data.

images/HOWTOs/UI-FeedProcessing-35.png

Stroom UI Create Feed - Translation - Stepping dsParser textConverter 3

We see that a <record> element has been formed with multiple key value pair <data> elements where the name attribute is the key and the value attribute the value. You will note that the keys have been taken from the Fields comment line which where placed in the $headings variable.

You should also take note that the stepping indicator has been incrementing the last number, so at this point it is displaying [1:1:3].

The general form of this indicator is

'[' streamId ':' subStreamId ':' recordNo ']'

where

  • streamId - is the stream ID and won’t change when stepping through the selected stream,
  • subStreamId - is the sub stream ID. When Stroom aggregates multiple event sources for a feed, it aggregates multiple input files and this is, in effect, the file number.
  • recordNo - is the record number within the sub stream.

One can double click on either the subStreamId or recordNo entry and enter a new value. This allows you to jump around a stream rather than just relying on first, previous, next and last movements.

Hovering the mouse over the stepping indicator will change the cursor to a hand pointer. Selecting (by a left click) the recordNo will allow you to edit it’s value (and the other values for that matter). You will see the display change from

images/HOWTOs/UI-FeedProcessing-36.png

Stroom UI Create Feed - Translation - Stepping Indicator 1
to
images/HOWTOs/UI-FeedProcessing-37.png

Stroom UI Create Feed - Translation - Stepping Indicator 2

If we change the record number from 3 to 12 then either press Enter or press the refresh-green.svg action we see

images/HOWTOs/UI-FeedProcessing-38.png

Stroom UI Create Feed - Translation - Stepping Indicator 3

and note that a new record has been processed in the input and output panes. Further, if one steps back to the Source element of the pipeline to view the raw source file, we see that the highlighted current line is the 12th line of processed data. It is the 10th actual bluecoat event, but remember the #Software, #Version lines are considered as processed data (2+10 = 12). Also noted that the #Date and #Fields lines are not considered processed data, and hence do not contribute to the recordNo value.

images/HOWTOs/UI-FeedProcessing-39.png

Stroom UI Create Feed - Translation - Stepping Indicator 4

If we select the dsParser pipeline element then press the fast-forward-green.svg action we see the recordNo jump to 31 which is the last processed line of our sample log file.

images/HOWTOs/UI-FeedProcessing-40.png

Stroom UI Create Feed - Translation - Stepping Indicator 5

Stepping the Pipeline - translationFilter

We now select the translationFilter pipeline element that results in

images/HOWTOs/UI-FeedProcessing-41.png

Stroom UI Create Feed - Translation - Stepping translationFilter Element

As for the dsParser, this window is made up of four panes.

The top pane remains the same - a display of the pipeline structure and the step indicator and green Stepping Actions.

The next pane down is the editing pane for the Translation Filter. This pane is used to edit an xslt translation that converts our simple key value pair <records> XML structure into another XML form.

The lower two panes are the input and output displays for the xslt translation. You will note that the input and output displays are identical for a null xslt translation is effectively a direct copy.

In this HOWTO we will transform the <records> XML structure into the GCHQ Stroom Event Logging XML Schema form which is documented here .

The authoring of this xslt translation is outside the scope of this HOWTO, as is the use of the Stroom XML Schema. It is recommended that one reads up on XSLT Conversion and the Stroom Event Logging XML Schema and review the various samples found in the Stroom Context packs published, or the Pull Requests of https://github.com/gchq/stroom-content .

We will build the translation in steps. We enter an initial portion of our xslt transformation that just consumes the Software and Version key values and converts the date and time values (which are in UTC) into the EventTime/TimeCreated element. This code segment is

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet
    xpath-default-namespace="records:2"
    xmlns="event-logging:3"
    xmlns:stroom="stroom" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    version="3.0">

  <!-- Bluecoat Proxy logs in W2C Extended Log File Format (ELF) -->

  <!-- Ingest the record key value pair elements -->
  <xsl:template match="records">
    <Events xsi:schemaLocation="event-logging:3 file://event-logging-v3.2.4.xsd" Version="3.2.4">
      <xsl:apply-templates />
    </Events>
  </xsl:template>

  <!-- Main record template for single event -->
  <xsl:template match="record">
    <xsl:choose>

      <!-- Store the Software and Version information of the Bluecoat log file for use 
      in the Event Source elements which are processed later -->
      <xsl:when test="data[@name='_bc_software']">
        <xsl:value-of select="stroom:put('_bc_software', data[@name='_bc_software']/@value)" />
      </xsl:when>
      <xsl:when test="data[@name='_bc_version']">
        <xsl:value-of select="stroom:put('_bc_version', data[@name='_bc_version']/@value)" />
      </xsl:when>

      <!-- Process the event logs -->
      <xsl:otherwise>
        <Event>
          <xsl:call-template name="event_time" />
        </Event>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <!-- Time -->
  <xsl:template name="event_time">
    <EventTime>
      <TimeCreated>
        <xsl:value-of select="concat(data[@name = 'date']/@value,'T',data[@name='time']/@value,'.000Z')" />
      </TimeCreated>
    </EventTime>
  </xsl:template>
</xsl:stylesheet>

After entering this translation and pressing the refresh-green.svg action shows the display

images/HOWTOs/UI-FeedProcessing-42.png

Stroom UI Create Feed - Translation - Stepping XSLT Translation 1

Note that this is the 31st record, so if we were to jump to the first record using the fast-backward-green.svg action, we see that the input and output change appropriately.

images/HOWTOs/UI-FeedProcessing-43.png

Stroom UI Create Feed - Translation - Stepping XSLT Translation 2

You will note that there is no Event element in the output pane as the record template in our xslt translation above is only storing the input’s key value (_bc_software’s value).

Further note that the BlueCoat_Proxy-V1.0-EVENTS tab ../stepping.svg * BlueCoat_Proxy-V1.0-EVENTS × has a star in front of it and also the Save icon save.svg is highlighted. This indicates that a component of the pipeline needs to be saved. In this case, the XSLT translation.

By pressing the Save icon, you will save the XSLT translation as it currently stands and both the star will be removed from the tab ../stepping.svg BlueCoat_Proxy-V1.0-EVENTS × and the Save icon save.svg will no longer be highlighted.

images/HOWTOs/UI-FeedProcessing-45.png

Stroom UI Create Feed - Translation - Stepping XSLT Translation 4

We next extend out translation by authoring a event_source template to form an appropriate Stroom Event Logging EventSource element structure. Thus our translation now is

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet
    xpath-default-namespace="records:2" 
    xmlns="event-logging:3" 
    xmlns:stroom="stroom"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    version="3.0">

  <!-- Bluecoat Proxy logs in W2C Extended Log File Format (ELF) -->

  <!-- Ingest the record key value pair elements -->
  <xsl:template match="records">
    <Events xsi:schemaLocation="event-logging:3 file://event-logging-v3.2.4.xsd" Version="3.2.4">
      <xsl:apply-templates />
    </Events>
  </xsl:template>

  <!-- Main record template for single event -->
  <xsl:template match="record">
    <xsl:choose>

      <!-- Store the Software and Version information of the Bluecoat log file for use in
      the Event Source elements which are processed later -->
      <xsl:when test="data[@name='_bc_software']">
        <xsl:value-of select="stroom:put('_bc_software', data[@name='_bc_software']/@value)" />
      </xsl:when>
      <xsl:when test="data[@name='_bc_version']">
        <xsl:value-of select="stroom:put('_bc_version', data[@name='_bc_version']/@value)" />
      </xsl:when>

      <!-- Process the event logs -->
      <xsl:otherwise>
        <Event>
          <xsl:call-template name="event_time" />
          <xsl:call-template name="event_source" />
        </Event>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <!-- Time -->
  <xsl:template name="event_time">
    <EventTime>
      <TimeCreated>
        <xsl:value-of select="concat(data[@name = 'date']/@value,'T',data[@name='time']/@value,'.000Z')" />
      </TimeCreated>
    </EventTime>
  </xsl:template>

  <!-- Template for event source-->
  <xsl:template name="event_source">

    <!--
    We extract some situational awareness information that the posting script includes when posting the event data 
    -->
    <xsl:variable name="_mymeta" select="translate(stroom:meta('MyMeta'),'&quot;', '')" />

    <!-- Form the EventSource node -->
    <EventSource>
      <System>
        <Name>
          <xsl:value-of select="stroom:meta('System')" />
        </Name>
        <Environment>
          <xsl:value-of select="stroom:meta('Environment')" />
        </Environment>
      </System>
      <Generator>
        <xsl:variable name="gen">
          <xsl:if test="stroom:get('_bc_software')">
            <xsl:value-of select="concat(' Software: ', stroom:get('_bc_software'))" />
          </xsl:if>
          <xsl:if test="stroom:get('_bc_version')">
            <xsl:value-of select="concat(' Version: ', stroom:get('_bc_version'))" />
          </xsl:if>
        </xsl:variable>
        <xsl:value-of select="concat('Bluecoat', $gen)" />
      </Generator>
      <xsl:if test="data[@name='s-computername'] or data[@name='s-ip']">
        <Device>
          <xsl:if test="data[@name='s-computername']">
            <Name>
              <xsl:value-of select="data[@name='s-computername']/@value" />
            </Name>
          </xsl:if>
          <xsl:if test="data[@name='s-ip']">
            <IPAddress>
              <xsl:value-of select=" data[@name='s-ip']/@value" />
            </IPAddress>
          </xsl:if>
          <xsl:if test="data[@name='s-sitename']">
            <Data Name="ServiceType" Value="{data[@name='s-sitename']/@value}" />
          </xsl:if>
        </Device>
      </xsl:if>

      <!-- -->
      <Client>
        <xsl:if test="data[@name='c-ip']/@value != '-'">
          <IPAddress>
            <xsl:value-of select="data[@name='c-ip']/@value" />
          </IPAddress>
        </xsl:if>

        <!-- Remote Port Number -->
        <xsl:if test="data[@name='c-port']/@value !='-'">
          <Port>
            <xsl:value-of select="data[@name='c-port']/@value" />
          </Port>
        </xsl:if>
      </Client>

      <!-- -->
      <Server>
        <HostName>
          <xsl:value-of select="data[@name='cs-host']/@value" />
        </HostName>
      </Server>

      <!-- -->
      <xsl:variable name="user">
        <xsl:value-of select="data[@name='cs-user']/@value" />
        <xsl:value-of select="data[@name='cs-username']/@value" />
        <xsl:value-of select="data[@name='cs-userdn']/@value" />
      </xsl:variable>
      <xsl:if test="$user !='-'">
        <User>
          <Id>
            <xsl:value-of select="$user" />
          </Id>
        </User>
      </xsl:if>
      <Data Name="MyMeta">
        <xsl:attribute name="Value" select="$_mymeta" />
      </Data>
    </EventSource>
  </xsl:template>
</xsl:stylesheet>

Stepping to the 3 record (the first real data record in our sample log) will reveal that our output pane has gained an EventSource element.

images/HOWTOs/UI-FeedProcessing-46.png

Stroom UI Create Feed - Translation - Stepping XSLT Translation 5

Note also, that our Save icon save.svg is also highlighted, so we should at some point save the extensions to our translation.

The complete translation now follows.

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet 
    xpath-default-namespace="records:2" 
    xmlns="event-logging:3" 
    xmlns:stroom="stroom" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    version="3.0">

  <!-- Bluecoat Proxy logs in W2C Extended Log File Format (ELF) -->

  <!-- Ingest the record key value pair elements -->
  <xsl:template match="records">
    <Events xsi:schemaLocation="event-logging:3 file://event-logging-v3.2.4.xsd" Version="3.2.4">
      <xsl:apply-templates />
    </Events>
  </xsl:template>

  <!-- Main record template for single event -->
  <xsl:template match="record">
    <xsl:choose>

      <!-- Store the Software and Version information of the Bluecoat log file for use in the Event
      Source elements which are processed later -->
      <xsl:when test="data[@name='_bc_software']">
        <xsl:value-of select="stroom:put('_bc_software', data[@name='_bc_software']/@value)" />
      </xsl:when>
      <xsl:when test="data[@name='_bc_version']">
        <xsl:value-of select="stroom:put('_bc_version', data[@name='_bc_version']/@value)" />
      </xsl:when>

      <!-- Process the event logs -->
      <xsl:otherwise>
        <Event>
          <xsl:call-template name="event_time" />
          <xsl:call-template name="event_source" />
          <xsl:call-template name="event_detail" />
        </Event>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <!-- Time -->
  <xsl:template name="event_time">
    <EventTime>
      <TimeCreated>
        <xsl:value-of select="concat(data[@name = 'date']/@value,'T',data[@name='time']/@value,'.000Z')" />
      </TimeCreated>
    </EventTime>
  </xsl:template>

  <!-- Template for event source-->
  <xsl:template name="event_source">

    <!-- We extract some situational awareness information that the posting script includes when
      posting the event data -->
    <xsl:variable name="_mymeta" select="translate(stroom:meta('MyMeta'),'&quot;', '')" />

    <!-- Form the EventSource node -->
    <EventSource>
      <System>
        <Name>
          <xsl:value-of select="stroom:meta('System')" />
        </Name>
        <Environment>
          <xsl:value-of select="stroom:meta('Environment')" />
        </Environment>
      </System>
      <Generator>
        <xsl:variable name="gen">
          <xsl:if test="stroom:get('_bc_software')">
            <xsl:value-of select="concat(' Software: ', stroom:get('_bc_software'))" />
          </xsl:if>
          <xsl:if test="stroom:get('_bc_version')">
            <xsl:value-of select="concat(' Version: ', stroom:get('_bc_version'))" />
          </xsl:if>
        </xsl:variable>
        <xsl:value-of select="concat('Bluecoat', $gen)" />
      </Generator>
      <xsl:if test="data[@name='s-computername'] or data[@name='s-ip']">
        <Device>
          <xsl:if test="data[@name='s-computername']">
            <Name>
              <xsl:value-of select="data[@name='s-computername']/@value" />
            </Name>
          </xsl:if>
          <xsl:if test="data[@name='s-ip']">
            <IPAddress>
              <xsl:value-of select=" data[@name='s-ip']/@value" />
            </IPAddress>
          </xsl:if>
          <xsl:if test="data[@name='s-sitename']">
            <Data Name="ServiceType" Value="{data[@name='s-sitename']/@value}" />
          </xsl:if>
        </Device>
      </xsl:if>

      <!-- -->
      <Client>
        <xsl:if test="data[@name='c-ip']/@value != '-'">
          <IPAddress>
            <xsl:value-of select="data[@name='c-ip']/@value" />
          </IPAddress>
        </xsl:if>

        <!-- Remote Port Number -->
        <xsl:if test="data[@name='c-port']/@value !='-'">
          <Port>
            <xsl:value-of select="data[@name='c-port']/@value" />
          </Port>
        </xsl:if>
      </Client>

      <!-- -->
      <Server>
        <HostName>
          <xsl:value-of select="data[@name='cs-host']/@value" />
        </HostName>
      </Server>

      <!-- -->
      <xsl:variable name="user">
        <xsl:value-of select="data[@name='cs-user']/@value" />
        <xsl:value-of select="data[@name='cs-username']/@value" />
        <xsl:value-of select="data[@name='cs-userdn']/@value" />
      </xsl:variable>
      <xsl:if test="$user !='-'">
        <User>
          <Id>
            <xsl:value-of select="$user" />
          </Id>
        </User>
      </xsl:if>
      <Data Name="MyMeta">
        <xsl:attribute name="Value" select="$_mymeta" />
      </Data>
    </EventSource>
  </xsl:template>

  <!-- Event detail -->
  <xsl:template name="event_detail">
    <EventDetail>

      <!--
        We model Proxy events as either Receive or Send events depending on the method.
      
        We make use of the Receive/Send sub-elements Source/Destination to map
        the Client/Destination Proxy values and the Payload sub-element to map
        the URL and other details of the activity. If we have a query, we model
        it as a Criteria
      -->
      <TypeId>
        <xsl:value-of select="concat('Bluecoat-', data[@name='cs-method']/@value, '-', data[@name='cs-uri-scheme']/@value)" />
        <xsl:if test="data[@name='cs-uri-query']/@value != '-'">-Query</xsl:if>
      </TypeId>
      <xsl:choose>
        <xsl:when test="matches(data[@name='cs-method']/@value, 'GET|OPTIONS|HEAD')">
          <Description>Receipt of information from a Resource via Proxy</Description>
          <Receive>
            <xsl:call-template name="setupParticipants" />
            <xsl:call-template name="setPayload" />
            <xsl:call-template name="setOutcome" />
          </Receive>
        </xsl:when>
        <xsl:otherwise>
          <Description>Transmission of information to a Resource via Proxy</Description>
          <Send>
            <xsl:call-template name="setupParticipants" />
            <xsl:call-template name="setPayload" />
            <xsl:call-template name="setOutcome" />
          </Send>
        </xsl:otherwise>
      </xsl:choose>
    </EventDetail>
  </xsl:template>

  <!-- Establish the Source and Destination nodes -->
  <xsl:template name="setupParticipants">
    <Source>
      <Device>
        <xsl:if test="data[@name='c-ip']/@value != '-'">
          <IPAddress>
            <xsl:value-of select="data[@name='c-ip']/@value" />
          </IPAddress>
        </xsl:if>

        <!-- Remote Port Number -->
        <xsl:if test="data[@name='c-port']/@value !='-'">
          <Port>
            <xsl:value-of select="data[@name='c-port']/@value" />
          </Port>
        </xsl:if>
      </Device>
    </Source>
    <Destination>
      <Device>
        <HostName>
          <xsl:value-of select="data[@name='cs-host']/@value" />
        </HostName>
      </Device>
    </Destination>
  </xsl:template>

  <!-- Define the Payload node -->
  <xsl:template name="setPayload">
    <Payload>
      <xsl:if test="data[@name='cs-uri-query']/@value != '-'">
        <Criteria>
          <DataSources>
            <DataSource>
              <xsl:value-of select="concat(data[@name='cs-uri-scheme']/@value, '://', data[@name='cs-host']/@value)" />
              <xsl:if test="data[@name='cs-uri-path']/@value != '/'">
                <xsl:value-of select="data[@name='cs-uri-path']/@value" />
              </xsl:if>
            </DataSource>
          </DataSources>
          <Query>
            <Raw>
              <xsl:value-of select="data[@name='cs-uri-query']/@value" />
            </Raw>
          </Query>
        </Criteria>
      </xsl:if>
      <Resource>

        <!-- Check for auth groups the URL belongs to -->
        <xsl:variable name="authgroups">
          <xsl:value-of select="data[@name='cs-auth-group']/@value" />
          <xsl:if test="exists(data[@name='cs-auth-group']) and exists(data[@name='cs-auth-groups'])">,</xsl:if>
          <xsl:value-of select="data[@name='cs-auth-groups']/@value" />
        </xsl:variable>
        <xsl:choose>
          <xsl:when test="contains($authgroups, ',')">
            <Groups>
              <xsl:for-each select="tokenize($authgroups, ',')">
                <Group>
                  <Id>
                    <xsl:value-of select="." />
                  </Id>
                </Group>
              </xsl:for-each>
            </Groups>
          </xsl:when>
          <xsl:when test="$authgroups != '-' and $authgroups != ''">
            <Groups>
              <Group>
                <Id>
                  <xsl:value-of select="$authgroups" />
                </Id>
              </Group>
            </Groups>
          </xsl:when>
        </xsl:choose>

        <!-- Re-form the URL -->
        <URL>
          <xsl:value-of select="concat(data[@name='cs-uri-scheme']/@value, '://', data[@name='cs-host']/@value)" />
          <xsl:if test="data[@name='cs-uri-path']/@value != '/'">
            <xsl:value-of select="data[@name='cs-uri-path']/@value" />
          </xsl:if>
        </URL>
        <HTTPMethod>
          <xsl:value-of select="data[@name='cs-method']/@value" />
        </HTTPMethod>
        <xsl:if test="data[@name='cs(User-Agent)']/@value !='-'">
          <UserAgent>
            <xsl:value-of select="data[@name='cs(User-Agent)']/@value" />
          </UserAgent>
        </xsl:if>

        <!-- Inbound activity -->
        <xsl:if test="data[@name='sc-bytes']/@value !='-'">
          <InboundSize>
            <xsl:value-of select="data[@name='sc-bytes']/@value" />
          </InboundSize>
        </xsl:if>
        <xsl:if test="data[@name='sc-bodylength']/@value !='-'">
          <InboundContentSize>
            <xsl:value-of select="data[@name='sc-bodylength']/@value" />
          </InboundContentSize>
        </xsl:if>

        <!-- Outbound activity -->
        <xsl:if test="data[@name='cs-bytes']/@value !='-'">
          <OutboundSize>
            <xsl:value-of select="data[@name='cs-bytes']/@value" />
          </OutboundSize>
        </xsl:if>
        <xsl:if test="data[@name='cs-bodylength']/@value !='-'">
          <OutboundContentSize>
            <xsl:value-of select="data[@name='cs-bodylength']/@value" />
          </OutboundContentSize>
        </xsl:if>

        <!-- Miscellaneous -->
        <RequestTime>
          <xsl:value-of select="data[@name='time-taken']/@value" />
        </RequestTime>
        <ResponseCode>
          <xsl:value-of select="data[@name='sc-status']/@value" />
        </ResponseCode>
        <xsl:if test="data[@name='rs(Content-Type)']/@value != '-'">
          <MimeType>
            <xsl:value-of select="data[@name='rs(Content-Type)']/@value" />
          </MimeType>
        </xsl:if>
        <xsl:if test="data[@name='cs-categories']/@value != 'none' or data[@name='sc-filter-category']/@value != 'none'">
          <Category>
            <xsl:value-of select="data[@name='cs-categories']/@value" />
            <xsl:value-of select="data[@name='sc-filter-category']/@value" />
          </Category>
        </xsl:if>

        <!-- Take up other items as data elements -->
        <xsl:apply-templates select="data[@name='s-action']" />
        <xsl:apply-templates select="data[@name='cs-uri-scheme']" />
        <xsl:apply-templates select="data[@name='s-hierarchy']" />
        <xsl:apply-templates select="data[@name='sc-filter-result']" />
        <xsl:apply-templates select="data[@name='x-virus-id']" />
        <xsl:apply-templates select="data[@name='x-virus-details']" />
        <xsl:apply-templates select="data[@name='x-icap-error-code']" />
        <xsl:apply-templates select="data[@name='x-icap-error-details']" />
      </Resource>
    </Payload>
  </xsl:template>

  <!-- Generic Data capture template so we capture all other Bluecoat objects not already consumed -->
  <xsl:template match="data">
    <xsl:if test="@value != '-'">
      <Data Name="{@name}" Value="{@value}" />
    </xsl:if>
  </xsl:template>

  <!-- 
         Set up the Outcome node.
  
  We only set an Outcome for an error state. The absence of an Outcome infers success
  -->
  <xsl:template name="setOutcome">
    <xsl:choose>

      <!-- Favour squid specific errors first -->
      <xsl:when test="data[@name='sc-status']/@value > 500">
        <Outcome>
          <Success>false</Success>
          <Description>
            <xsl:call-template name="responseCodeDesc">
              <xsl:with-param name="code" select="data[@name='sc-status']/@value" />
            </xsl:call-template>
          </Description>
        </Outcome>
      </xsl:when>

      <!-- Now check for 'normal' errors -->
      <xsl:when test="tCliStatus > 400">
        <Outcome>
          <Success>false</Success>
          <Description>
            <xsl:call-template name="responseCodeDesc">
              <xsl:with-param name="code" select="data[@name='sc-status']/@value" />
            </xsl:call-template>
          </Description>
        </Outcome>
      </xsl:when>
    </xsl:choose>
  </xsl:template>

  <!-- Response Code map to Descriptions -->
  <xsl:template name="responseCodeDesc">
    <xsl:param name="code" />
    <xsl:choose>

      <!-- Informational -->
      <xsl:when test="$code = 100">Continue</xsl:when>
      <xsl:when test="$code = 101">Switching Protocols</xsl:when>
      <xsl:when test="$code = 102">Processing</xsl:when>

      <!-- Successful Transaction -->
      <xsl:when test="$code = 200">OK</xsl:when>
      <xsl:when test="$code = 201">Created</xsl:when>
      <xsl:when test="$code = 202">Accepted</xsl:when>
      <xsl:when test="$code = 203">Non-Authoritative Information</xsl:when>
      <xsl:when test="$code = 204">No Content</xsl:when>
      <xsl:when test="$code = 205">Reset Content</xsl:when>
      <xsl:when test="$code = 206">Partial Content</xsl:when>
      <xsl:when test="$code = 207">Multi Status</xsl:when>

      <!-- Redirection -->
      <xsl:when test="$code = 300">Multiple Choices</xsl:when>
      <xsl:when test="$code = 301">Moved Permanently</xsl:when>
      <xsl:when test="$code = 302">Moved Temporarily</xsl:when>
      <xsl:when test="$code = 303">See Other</xsl:when>
      <xsl:when test="$code = 304">Not Modified</xsl:when>
      <xsl:when test="$code = 305">Use Proxy</xsl:when>
      <xsl:when test="$code = 307">Temporary Redirect</xsl:when>

      <!-- Client Error -->
      <xsl:when test="$code = 400">Bad Request</xsl:when>
      <xsl:when test="$code = 401">Unauthorized</xsl:when>
      <xsl:when test="$code = 402">Payment Required</xsl:when>
      <xsl:when test="$code = 403">Forbidden</xsl:when>
      <xsl:when test="$code = 404">Not Found</xsl:when>
      <xsl:when test="$code = 405">Method Not Allowed</xsl:when>
      <xsl:when test="$code = 406">Not Acceptable</xsl:when>
      <xsl:when test="$code = 407">Proxy Authentication Required</xsl:when>
      <xsl:when test="$code = 408">Request Timeout</xsl:when>
      <xsl:when test="$code = 409">Conflict</xsl:when>
      <xsl:when test="$code = 410">Gone</xsl:when>
      <xsl:when test="$code = 411">Length Required</xsl:when>
      <xsl:when test="$code = 412">Precondition Failed</xsl:when>
      <xsl:when test="$code = 413">Request Entity Too Large</xsl:when>
      <xsl:when test="$code = 414">Request URI Too Large</xsl:when>
      <xsl:when test="$code = 415">Unsupported Media Type</xsl:when>
      <xsl:when test="$code = 416">Request Range Not Satisfiable</xsl:when>
      <xsl:when test="$code = 417">Expectation Failed</xsl:when>
      <xsl:when test="$code = 422">Unprocessable Entity</xsl:when>
      <xsl:when test="$code = 424">Locked/Failed Dependency</xsl:when>
      <xsl:when test="$code = 433">Unprocessable Entity</xsl:when>

      <!-- Server Error -->
      <xsl:when test="$code = 500">Internal Server Error</xsl:when>
      <xsl:when test="$code = 501">Not Implemented</xsl:when>
      <xsl:when test="$code = 502">Bad Gateway</xsl:when>
      <xsl:when test="$code = 503">Service Unavailable</xsl:when>
      <xsl:when test="$code = 504">Gateway Timeout</xsl:when>
      <xsl:when test="$code = 505">HTTP Version Not Supported</xsl:when>
      <xsl:when test="$code = 507">Insufficient Storage</xsl:when>
      <xsl:when test="$code = 600">Squid: header parsing error</xsl:when>
      <xsl:when test="$code = 601">Squid: header size overflow detected while parsing/roundcube: software configuration error</xsl:when>
      <xsl:when test="$code = 603">roundcube: invalid authorization</xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="concat('Unknown Code:', $code)" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>


BlueCoat XSLT Translation ( Download BlueCoat.xslt )

Refreshing the current event will show the output pane contains

<?xml version="1.1" encoding="UTF-8"?>
<Events 
    xmlns="event-logging:3" 
    xmlns:stroom="stroom" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    xsi:schemaLocation="event-logging:3 file://event-logging-v3.2.4.xsd" 
    Version="3.2.4">
  <Event>
    <EventTime>
      <TimeCreated>2005-05-04T17:16:12.000Z</TimeCreated>
    </EventTime>
    <EventSource>
      <System>
        <Name>Site http://log-sharing.dreamhosters.com/ Bluecoat Logs</Name>
        <Environment>Development</Environment>
      </System>
      <Generator>Bluecoat Software: SGOS 3.2.4.28 Version: 1.0</Generator>
      <Device>
        <IPAddress>192.16.170.42</IPAddress>
        <Data Name="ServiceType" Value="SG-HTTP-Service" />
      </Device>
      <Client>
        <IPAddress>45.110.2.82</IPAddress>
      </Client>
      <Server>
        <HostName>www.inmobus.com</HostName>
      </Server>
      <User>
        <Id>george</Id>
      </User>
      <Data Name="MyMeta" Value="FQDN:somenode.strmdev00.org\nipaddress:192.168.2.220\nipaddress_eth0:192.168.2.220\nipaddress_lo:127.0.0.1\nipaddress_virbr0:192.168.122.1\n" />
    </EventSource>
    <EventDetail>
      <TypeId>Bluecoat-GET-http</TypeId>
      <Description>Receipt of information from a Resource via Proxy</Description>
      <Receive>
        <Source>
          <Device>
            <IPAddress>45.110.2.82</IPAddress>
          </Device>
        </Source>
        <Destination>
          <Device>
            <HostName>www.inmobus.com</HostName>
          </Device>
        </Destination>
        <Payload>
          <Resource>
            <URL>http://www.inmobus.com/wcm/assets/images/imagefileicon.gif</URL>
            <HTTPMethod>GET</HTTPMethod>
            <UserAgent>Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)</UserAgent>
            <InboundSize>941</InboundSize>
            <OutboundSize>729</OutboundSize>
            <RequestTime>1</RequestTime>
            <ResponseCode>200</ResponseCode>
            <MimeType>image/gif</MimeType>
            <Data Name="s-action" Value="TCP_HIT" />
            <Data Name="cs-uri-scheme" Value="http" />
            <Data Name="s-hierarchy" Value="DIRECT" />
            <Data Name="sc-filter-result" Value="PROXIED" />
            <Data Name="x-icap-error-code" Value="none" />
          </Resource>
        </Payload>
      </Receive>
    </EventDetail>
  </Event>
</Events>

for the given input

<?xml version="1.1" encoding="UTF-8"?>
<records xmlns="records:2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="records:2 file://records-v2.0.xsd" version="2.0">
  <record>
    <data name="date" value="2005-05-04" />
    <data name="time" value="17:16:12" />
    <data name="time-taken" value="1" />
    <data name="c-ip" value="45.110.2.82" />
    <data name="sc-status" value="200" />
    <data name="s-action" value="TCP_HIT" />
    <data name="sc-bytes" value="941" />
    <data name="cs-bytes" value="729" />
    <data name="cs-method" value="GET" />
    <data name="cs-uri-scheme" value="http" />
    <data name="cs-host" value="www.inmobus.com" />
    <data name="cs-uri-path" value="/wcm/assets/images/imagefileicon.gif" />
    <data name="cs-uri-query" value="-" />
    <data name="cs-username" value="george" />
    <data name="s-hierarchy" value="DIRECT" />
    <data name="s-supplier-name" value="38.112.92.20" />
    <data name="rs(Content-Type)" value="image/gif" />
    <data name="cs(User-Agent)" value="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)" />
    <data name="sc-filter-result" value="PROXIED" />
    <data name="sc-filter-category" value="none" />
    <data name="x-virus-id" value="-" />
    <data name="s-ip" value="192.16.170.42" />
    <data name="s-sitename" value="SG-HTTP-Service" />
    <data name="x-virus-details" value="-" />
    <data name="x-icap-error-code" value="none" />
    <data name="x-icap-error-details" value="-" />
  </record>
</records>

Do not forget to Save save.svg the translation as we are complete.

Schema Validation

One last point, validating the use of the Stroom Event Logging Schema is performed in the schemaFilter component of the pipeline. Had our translation resulted in a malformed Event, this pipeline component displays any errors. In the screen below, we have purposely changed the EventTime/TimeCreated element to be EventTime/TimeCreatd. If one selects the schemaFilter component and then Refresh refresh-green.svg the current step, we will see that

  • there is an error as indicated by a square Red box ../HOWTOs/icons/errorIndicator.png in the top right hand corner

  • there is a Red rectangle line indicator mark ../HOWTOs/icons/errorLine.png on the right hand side in the display slide bar

  • there is a Red error marker error.svg in the left hand gutter.

images/HOWTOs/UI-FeedProcessing-47.png

Stroom UI Create Feed - Translation - Stepping XSLT Translation 6

Hovering over the error marker error.svg on the left hand side will bring a pop-up describing the error.

images/HOWTOs/UI-FeedProcessing-48.png

Stroom UI Create Feed - Translation - Stepping XSLT Translation 7

At this point, close the BlueCoat-Proxy-V1.0-EVENTS stepping tab, acknowledging you do not want to save your errant changes

images/HOWTOs/UI-FeedProcessing-49.png

Stroom UI Create Feed - Translation - Stepping XSLT Translation 8

by pressing the Ok button.

Automated Processing

Now that we have authored our translation, we want to enable Stroom to automatically process streams of raw event log data as it arrives. We do this by configuring a Processor in the BlueCoat-Proxy-V1.0-EVENTS pipeline.

Adding a Pipeline Processor

Open the BlueCoat-Proxy-V1.0-EVENTS pipeline by selecting it (double left click) in the Explorer display to show

images/HOWTOs/UI-FeedProcessing-50.png

Stroom UI Enable Processing

To configure a Processor we select the Processors hyper-link of the BlueCoat-Proxy-V1.0-EVENTS Pipeline tab to reveal

images/HOWTOs/UI-FeedProcessing-51.png

Stroom UI Enable Processing - Processors table

We add a Processor by pressing the add processor button add.svg in the top left hand corner. At this you will be presented with an Add Filter configuration window.

images/HOWTOs/UI-FeedProcessing-52.png

Stroom UI Enable Processing - Add Filter 1

As we wish to create a Processor that will automatically process all BlueCoat-Proxy-V1.0-EVENTS feed Raw Events we will select the BlueCoat-Proxy-V1.0-EVENTS Feed and Raw Event Stream Type.

To select the feed, we press the Edit button edit.svg . At this, the Choose Feeds To Include And Exclude configuration window is displayed.

images/HOWTOs/UI-FeedProcessing-53.png

Stroom UI Enable Processing - Add Filter 2

As we need to Include the BlueCoat-Proxy-V1.0-EVENTS Feed in our selection, press the add.svg button in the Include: pane of the window to be presented with a Choose Item configuration window.

images/HOWTOs/UI-FeedProcessing-54.png

Stroom UI Enable Processing - Add Filter 3

Navigate to the Event Sources/Proxy/BlueCoat folder and select the BlueCoat-Proxy-V1.0-EVENTS Feed

images/HOWTOs/UI-FeedProcessing-55.png

Stroom UI Enable Processing - Add Filter 4

then press the Ok button to select and see that the feed is included.

images/HOWTOs/UI-FeedProcessing-56.png

Stroom UI Enable Processing - Add Filter 5

Again press the Ok button to close the Choose Feeds To Include And Exclude window to show that we have selected our feed in the Feeds: selection pane of the Add Filter configuration window.

images/HOWTOs/UI-FeedProcessing-57.png

Stroom UI Enable Processing - Add Filter 6

We now need to select our Stream Type. Press the add.svg button in the Stream Types: pane of the window to be presented with a Add Stream Type window with a Stream Type: selection drop down.

images/HOWTOs/UI-FeedProcessing-58.png

Stroom UI Enable Processing - Add Filter 7

We select (left click) the drop down selection to display the types of Stream we can choose

images/HOWTOs/UI-FeedProcessing-59.png

Stroom UI Enable Processing - Add Filter 8

and as we are selecting Raw Events we select that item then press the Ok button at which we see that our Add Filter configuration window displays

images/HOWTOs/UI-FeedProcessing-60.png

Stroom UI Enable Processing - Add Filter 9

As we have selected our filter items, press the Ok button to display our configured Processors.

images/HOWTOs/UI-FeedProcessing-61.png

Stroom UI Enable Processing - Configured Processors

We now see our display is divided into two panes. The Processors table pane at the top and the specific Processor pane below. In our case, our filter selection has left the BlueCoat-Proxy-V1.0-EVENTS Filter selected in the Processors table

images/HOWTOs/UI-FeedProcessing-62.png

Stroom UI Enable Processing - Configured Processors - Selected Processor

and the specific filter’s details in the bottom pane.

images/HOWTOs/UI-FeedProcessing-63.png

Stroom UI Enable Processing - Configured Processors - Selected Processor Detail

The column entries in the Processors Table pane describe

  • Pipeline - the name of the Processor pipeline ( filter.svg )
  • Tracker Ms - the last time the tracker updated
  • Tracker % - the percentage of available streams completed
  • Last Poll Age - the last time the processor found new streams to process
  • Task Count - the number of processor tasks currently running
  • Priority - the queue scheduling priority of task submission to available stream processors
  • Streams - the number of streams that have been processed (includes currently running streams)
  • Events - ??
  • Status - the status of the processor.
  • Normally empty if the number of stream is open-ended.
  • If only are subset of streams were chosen (e.g. a time range in the filter) then the status will be Complete
  • Enabled - check box to indicate the processor is enabled

We now need only Enable both the pipeline Processor and the pipeline Filter for automatic processing to occur. We do this by selecting both check boxes in the Enabled column.

images/HOWTOs/UI-FeedProcessing-64.png

Stroom UI Enable Processing - Configured Processors - Enable Processor

If we refresh our Processor table by pressing the refresh.svg button in the top right hand corner, we will see that more table entries have been filled in.

images/HOWTOs/UI-FeedProcessing-65.png

Stroom UI Enable Processing - Configured Processors - Enable Processor Result

We see that the tracker last updated at 2018-07-14T04:00:35.289Z, the percentage complete is 100 (we only had one stream after all), the last time active streams were checked for was 2.3 minutes ago, there are no tasks running and that 1 stream has completed. Note that the Status column is blank as we have an open ended filter in that the processor will continue to select and process any new stream of Raw Events coming into the BlueCoat-Proxy-V1.0-EVENTS feed.

If we return to the BlueCoat-Proxy-V1.0-EVENTS* Feed tab, ensuring the Data hyper-link is selected and then refresh ( refresh.svg ) the top pane that holds the summary of the latest Feed streams

images/HOWTOs/UI-FeedProcessing-66.png

Stroom UI Enable Processing - Configured Processors - Feed Display

We see a new entry in the table. The columns display

  • Created - The time the stream was created.
  • Type - The type of stream. Our new entry has a type of ‘Events’ as we have processed our Raw Events data.
  • Feed - The name of the stream’s feed
  • Pipeline - The name of the pipeline involved in the generation of the stream
  • Raw - The size in bytes of the raw stream data
  • Disk - The size in bytes of the raw stream data when stored in compressed form on the disk
  • Read - The number of records read by a pipeline
  • Write - The number of records (events) written by a pipeline. In this case the difference is that we did not generate events for the Software or Version records we read.
  • Fatal - The number of fatal errors the pipeline encountered when processing this stream
  • Error - The number of errors the pipeline encountered when processing this stream
  • Warn - The number of warnings the pipeline encountered when processing this stream
  • Info - The number of informational alerts the pipeline encountered when processing this stream
  • Retention - The retention period for this stream of data

If we also refresh ( refresh.svg ) the specific feed pane (middle) we again see a new entry of the Events Type

images/HOWTOs/UI-FeedProcessing-67.png

Stroom UI Enable Processing - Configured Processors - Specific Feed Display

If we select (left click) on the Events Type in either pane, we will see that the data pane displays the first event in the GCHQ Stroom Event Logging XML Schema form.

images/HOWTOs/UI-FeedProcessing-68.png

Stroom UI Enable Processing - Configured Processors - Event Display

We can now send a file of BlueCoat Proxy logs to our Stroom instance from a Linux host using curl command and see how Stroom will automatically processes the file. Use the command

curl \
-k \
--data-binary @sampleBluecoat.log \
https://stroomp.strmdev00.org/stroom/datafeed \
-H"Feed:BlueCoat-Proxy-V1.0-EVENTS" \
-H"Environment:Development" \
-H"LogFileName:sampleBluecoat.log" \
-H"MyHost:\"somenode.strmdev00.org\"" \
-H"MyIPaddress:\"192.168.2.220 192.168.122.1\"" \
-H"System:Site http://log-sharing.dreamhosters.com/ Bluecoat Logs" \
-H"Version:V1.0"

After Stroom’s Proxy aggregation has occurred, we will see that the new file posted via curl has been loaded into Stroom as per

images/HOWTOs/UI-FeedProcessing-69.png

Stroom UI Enable Processing - Configured Processors - New Posted Stream

and this new Raw Event stream is automatically processed a few seconds later as per

images/HOWTOs/UI-FeedProcessing-70.png

Stroom UI Enable Processing - Configured Processors - New Posted Stream Processed

We note that since we have used the same sample file again, the Stream sizes and record counts are the same.

If we switch to the Processors tab of the pipeline we see that the Tracker timestamp has changed and the number of Streams processed has increased.

images/HOWTOs/UI-FeedProcessing-71.png

Stroom UI Enable Processing - Configured Processors - New Posted Stream Processors