XSLT is a language that is typically used for transforming XML documents into either a different XML document or plain text.
XSLT is key part of Stroom’s pipeline processing as it is used to normalise bespoke events into a common XML audit event document conforming to the event-loggingXML Schema.
Once a text file has been converted into intermediary XML (or the feed is already XML), XSLT is used to
translate the XML into the event-logging XML format.
The
XSLTFilter
pipeline element defines the XSLT document and is used to do the transformation of the input XML into XML or plain text.
You can have multiple XSLTFilter elements in a pipeline if you want to break the transformation into steps, or wish to have simpler XSLTs that can be reused.
Raw Event Feeds are typically translated into the event-logging:3 schema and Raw Reference into the reference-data:2 schema.
1 - XSLT Basics
The basics of using XSLT and the XSLTFilter element.
XSLT is a very powerful language and allows the user to perform very complex transformations of XML data.
This documentation does not aim to document how to write XSLT documents, for that, we strongly recommend you refer to online references (e.g.
W3Schools
or obtain a book covering XSLT 2.0 and XPath).
It does however aim to document aspects of XSLT that are specific to the use of XSLT in Stroom.
Examples
Event Normalisation
Here is an example XSLT document that transforms XML data in the records:2namespace (which is the output of the
DSParser
element) into event XML in the event-logging:3 namespace.
It is an example of event normalisation from a bespoke format.
Warning
This example aims to show some typical uses of XSLT in a typical Stroom use case.
It does not necessarily represent best practice in terms of creation of a normalised event.
Here is an example of transforming Reference Data in the records:2namespace (which is the output of the
DSParser
element) into XML in the reference-data:2 namespace that is suitable for loading using the
ReferenceDataFilter
If you want an XSLT to decorate an Events XML document with some additional data or to change it slightly without changing its namespace then a good starting point is the identity transformation.
<xsl:stylesheet
version="1.0"
xpath-default-namespace="event-logging:3"
xmlns="event-logging:3"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Match Root Object -->
<xsl:template match="Events">
<Events
xmlns="event-logging:3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="event-logging:3 file://event-logging-v3.4.2.xsd"
Version="3.4.2">
<xsl:apply-templates />
</Events>
</xsl:template>
<!-- Whenever you match any node or any attribute -->
<xsl:template match="node( )|@*">
<!-- Copy the current node -->
<xsl:copy>
<!-- Including any attributes it has and any child nodes -->
<xsl:apply-templates select="@*|node( )" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This XSLT will copy every node and attribute as they are, returning the input document completely un-changed.
You can then add additional templates to match on specific elements and modify them, for example decorating a user’s UserDetails elements with value obtained from a reference data lookup on a user ID.
Note
You can insert this identity skeleton into an XSLT editor using this editor snippet.
<xsl:message>
Stroom supports the standard <xsl:message> element from the
http://www.w3.org/1999/XSL/Transform
.
This element behaves in a similar way to the stroom:log() XSLT function.
The element text is logged to the Error stream with a default severity of ERROR.
A child element can optionally be used to set the severity level (one of FATAL|ERROR|WARN|INFO).
The namespace of this element does not matter.
You can also set the attribute terminate="yes" to log the message at severity FATAL and halt processing of that stream part.
If the stream is multi-part then processing will continue with the next part.
Note
Setting terminate="yes" will trump any severity defined by a child element.
It will always be logged at FATAL.
The following are some examples of using <xsl:message>.
<!-- Log a message using default severity of ERROR -->
<xsl:message>Invalid length</xsl:message>
<!-- terminate="yes" means log the message as a FATAL ERROR and halt processing of the stream part -->
<xsl:message terminate="yes">Invalid length</xsl:message>
<!-- Log a message with a child element name specifying the severity. -->
<xsl:message>
<warn>Invalid length</warn>
</xsl:message>
<!-- Log a message with a child element name specifying the severity. -->
<xsl:message>
<info>Invalid length</info>
</xsl:message>
<!-- Log a message, specifying the severity and using a dynamic value. -->
<xsl:message>
<info>
<xsl:value-of select="concat('User ID ', $userId, ' is invalid')" />
</info>
</xsl:message>
cidr-to-numeric-ip-range() - Converts a CIDR IP address range to an array of numeric IP addresses representing the start and end addresses of the range.
classification() - The classification of the feed for the data being processed
col-from() - The column in the input that the current record begins on (can be 0).
col-to() - The column in the input that the current record ends at.
current-time() - The current system time
current-user() - The current user logged into Stroom (only relevant for interactive use, e.g. search)
decode-url(String encodedUrl) - Decode the provided url.
dictionary(String name) - Loads the contents of the named dictionary for use within the translation
encode-url(String url) - Encode the provided url.
feed-attribute(String attributeKey) - NOTE: This function is deprecated, use meta(String key) instead.
The value for the supplied feed attributeKey.
feed-name() - Name of the feed for the data being processed
fetch-json(String url) - Simplistic version of http-call that sends a request to the passed url and converts the JSON response body to XML using json-to-xml.
Currently does not support SSL configuration like http-call does.
format-date(String milliseconds) - Format a date that is specified as a number of milliseconds since a standard base time known as “the epoch”, namely January 1, 1970, 00:00:00 GMT
get(String key) - Returns the value associated with a key that has been stored in a map using the put() function.
The map is in the scope of the current pipeline process so values do not live after the stream has been processed.
hash(String value) - Hash a string value using the default SHA-256 algorithm and no salt
hash(String value, String algorithm, String salt) - Hash a string value using the specified hashing algorithm and supplied salt value.
Supported hashing algorithms include SHA-256, SHA-512, MD5.
hex-to-dec(String hex) - Convert hex to dec representation.
hex-to-oct(String hex) - Convert hex to oct representation.
meta(String key) - Lookup a meta data value for the current stream using the specified key.
The key can be Feed, StreamType, CreatedTime, EffectiveTime, Pipeline or any other attribute supplied when the stream was sent to Stroom, e.g. meta(‘System’).
meta-keys() - Returns an array of meta keys for the current stream. Each key can then be used to retrieve its corresponding meta value, by calling meta($key).
numeric-ip(String ipAddress) - Convert an IP address to a numeric representation for range comparison
part-no() - The current part within a multi part aggregated input stream (AKA the substream number) (1 based)
parse-uri(String URI) - Returns an XML structure of the URI providing authority, fragment, host, path, port, query, scheme, schemeSpecificPart, and userInfo components if present.
pipeline-name() - Get the name of the pipeline currently processing the stream.
The bitmap-lookup() function looks up a bitmap key from reference or context data a value (which can be an XML node set) for each set bit position and adds it to the resultant XML.
map - The name of the reference data map to perform the lookup against.
key - The bitmap value to lookup.
This can either be represented as a decimal integer (e.g. 14) or as hexadecimal by prefixing with 0x (e.g 0xE).
time - Determines which set of reference data was effective at the requested time.
If no reference data exists with an effective time before the requested time then the lookup will fail.
Time is in the format yyyy-MM-dd'T'HH:mm:ss.SSSXX, e.g. 2010-01-01T00:00:00.000Z.
ignoreWarnings - If true, any lookup failures will be ignored, else they will be reported as warnings.
trace - If true, additional trace information is output as INFO messages.
If the look up fails no result will be returned.
The key is a bitmap expressed as either a decimal integer or a hexidecimal value, e.g. 14/0xE is 1110 as a binary bitmap.
For each bit position that is set, (i.e. has a binary value of 1) a lookup will be performed using that bit position as the key.
In this example, positions 1, 2 & 3 are set so a lookup would be performed for these bit positions.
The result of each lookup for the bitmap are concatenated together in bit position order, separated by a space.
If ignoreWarnings is true then any lookup failures will be ignored and it will return the value(s) for the bit positions it was able to lookup.
This function can be useful when you have a set of values that can be represented as a bitmap and you need them to be converted back to individual values.
For example if you have a set of additive account permissions (e.g Admin, ManageUsers, PerformExport, etc.), each of which is associated with a bit position, then a user’s permissions could be defined as a single decimal/hex bitmap value.
Thus a bitmap lookup with this value would return all the permissions held by the user.
For example the reference data store may contain:
Key (Bit position)
Value
0
Administrator
1
Manage_Users
2
Perform_Export
3
View_Data
4
Manage_Jobs
5
Delete_Data
6
Manage_Volumes
The following are example lookups using the above reference data:
Lookup Key (decimal)
Lookup Key (Hex)
Bitmap
Result
0
0x0
0000000
-
1
0x1
0000001
Administrator
74
0x4A
1001010
Manage_Users View_Data Manage_Volumes
2
0x2
0000010
Manage_Users
96
0x60
1100000
Delete_Data Manage_Volumes
cidr-to-numeric-ip-range()
Converts a CIDR IP address range to an array of numeric IP addresses representing the start and end (broadcast) of the range.
When storing the result in a variable, ensure you indicate the type as a string array (xs:string*), as shown in the below example.
The dictionary() function gets the contents of the specified dictionary for use during translation.
The main use for this function is to allow users to abstract the management of a set of keywords from the XSLT so that it is easier for some users to make quick alterations to a dictionary that is used by some XSLT, without the need for the user to understand the complexities of XSLT.
format-date()
The format-date() function takes a Pattern and optional TimeZone arguments and replaces the parsed
contents with an XML standard Date Format. The pattern must be a Java based SimpleDateFormat, see Dates & Times for details.
If the optional TimeZone argument is present the pattern must not include the time zone pattern tokens (z and Z).
A special time zone value of “GMT/BST” can be used to guess the time based on the date (BST during British Summer Time).
E.g. Convert a GMT date time “2009/12/01 12:34:11”
headers - A newline ( ) delimited list of HTTP headers to send.
Each header is of the form key:value.
mediaType - The media (or MIME) type of the request data, e.g. application/json.
If not set application/json; charset=utf-8 will be used.
data - The data to send.
The data type should be consistent with mediaType.
Supplying the data argument means a POST request method will be used rather than the default GET.
clientConfig - A JSON object containing the configuration for the HTTP client to use, including any SSL configuration.
The function returns the response as XML with namespace stroom-http.
The XML includes the body of the response in addition to the status code, success status, message and any headers.
clientConfig
The client can be configured using a JSON object containing various optional configuration items.
The following is an example of the client configuration object with all keys populated.
This is an example of how to use the function call in your XSLT.
It is recommended to place the clientConfig JSON in a Dictionary to make it easier to edit and to avoid having to escape all the quotes.
...
<xsl:template match="record">
...
<!-- Read the client config from a Dictionary into a variable -->
<xsl:variable name="clientConfig" select="stroom:dictionary('HTTP Client Config')" />
<!-- Make the HTTP call and store the response in a variable -->
<xsl:variable name="response" select="stroom:http-call('https://reqbin.com/echo', null, null, null, $clientConfig)" />
<!-- Apply 'response' templates to the response -->
<xsl:apply-templates mode="response" select="$response" />
...
</xsl:template>
<xsl:template mode="response" match="http:response">
<!-- Extract just the body of the response -->
<val><xsl:value-of select="./http:body/text()" /></val>
</xsl:template>
...
link()
Create a string that represents a hyperlink for display in a dashboard table.
dialog : Display the content of the link URL within a stroom popup dialog.
tab : Display the content of the link URL within a stroom tab.
browser : Display the content of the link URL within a new browser tab.
dashboard : Used to launch a stroom dashboard internally with parameters in the URL.
If you wish to override the default title or URL of the target link in either a tab or dialog you can. Both dialog and tab types allow titles to be specified after a |, e.g. dialog|My Title.
log()
The log() function writes a message to the processing log with the specified severity.
Severities of INFO, WARN, ERROR and FATAL can be used.
Severities of ERROR and FATAL will result in records being omitted from the output if a RecordOutputFilter is used in the pipeline.
The counts for RecWarn, RecError will be affected by warnings or errors generated in this way therefore this function is useful for adding business rules to XML output.
E.g. Warn if a SID is not the correct length.
<xsl:if test="string-length($sid) != 7">
<xsl:value-of select="stroom:log('WARN', concat($sid, ' is not the correct length'))"/>
</xsl:if>
The same functionality can also be achieved using the standard xsl:message element, see <xsl:message>
lookup()
The lookup() function looks up from reference or context data a value (which can be an XML node set) and adds it to the resultant XML.
map - The name of the reference data map to perform the lookup against.
key - The key to lookup. The key can be a simple string, an integer value in a numeric range or a nested lookup key.
time - Determines which set of reference data was effective at the requested time.
If no reference data exists with an effective time before the requested time then the lookup will fail.
Time is in the format yyyy-MM-dd'T'HH:mm:ss.SSSXX, e.g. 2010-01-01T00:00:00.000Z.
ignoreWarnings - If true, any lookup failures will be ignored, else they will be reported as warnings.
trace - If true, additional trace information is output as INFO messages.
If the look up fails no result will be returned.
By testing the result a default value may be output if no result is returned.
Reference data entries can either be stored with single string key or a key range that defines a numeric range, e.g 1-100.
When a lookup is preformed the passed key is looked up as if it were a normal string key.
If that lookup fails Stroom will try to convert the key to an integer (long) value.
If it can be converted to an integer than a second lookup will be performed against entries with key ranges to see if there is a key range that includes the requested key.
Range lookups can be used for looking up an IP address where the reference data values are associated with ranges of IP addresses.
In this use case, the IP address must first be converted into a numeric value using numeric-ip(), e.g:
Similarly the reference data must be stored with key ranges whose bounds were created using this function.
Nested Maps
The lookup function allows you to perform chained lookups using nested maps.
For example you may have a reference data map called USER_ID_TO_LOCATION that maps user IDs to some location information for that user and a map called USER_ID_TO_MANAGER that maps user IDs to the user ID of their manager.
If you wanted to decorate a user’s event with the location of their manager you could use a nested map to achieve the lookup chain.
To perform the lookup set the map argument to the list of maps in the lookup chain, separated by a /, e.g. USER_ID_TO_MANAGER/USER_ID_TO_LOCATION.
This will perform a lookup against the first map in the list using the requested key.
If a value is found the value will be used as the key in a lookup against the next map.
The value from each map lookup is used as the key in the next map all the way down the chain.
The value from the last lookup is then returned as the result of the lookup() call.
If no value is found at any point in the chain then that results in no value being returned from the function.
In order to use nested map lookups each intermediate map must contain simple string values.
The last map in the chain can either contain string values or XML fragment values.
put() and get()
You can put values into a map using the put() function.
These values can then be retrieved later using the get() function.
Values are stored against a key name so that multiple values can be stored.
These functions can be used for many purposes but are most commonly used to count a number of records that meet certain criteria.
The map is in the scope of the current pipeline process so values do not live after the stream has been processed.
Also, the map will only contain entries that were put() within the current pipeline process.
An example of how to count records is shown below:
<!-- Get the current record count -->
<xsl:variable name="currentCount" select="number(s:get('count'))" />
<!-- Increment the record count -->
<xsl:variable name="count">
<xsl:choose>
<xsl:when test="$currentCount">
<xsl:value-of select="$currentCount + 1" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="1" />
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<!-- Store the count for future retrieval -->
<xsl:value-of select="stroom:put('count', $count)" />
<!-- Output the new count -->
<data name="Count">
<xsl:attribute name="Value" select="$count" />
</data>
meta-keys()
When calling this function and assigning the result to a variable, you must specify the variable data type of xs:string* (array of strings).
The following fragment is an example of using meta-keys() to emit all meta values for a given stream, into an Event/Meta element:
The parse-uri() function takes a Uniform Resource Identifier (URI) in string form and returns an XML node with a namespace of uri containing the URI’s individual components of authority, fragment, host, path, port, query, scheme, schemeSpecificPart and userInfo. See either RFC 2306: Uniform Resource Identifiers (URI): Generic Syntax or Java’s java.net.URI Class for details regarding the components.
The following xml
<!-- Display and parse the URI contained within the text of the rURI element -->
<xsl:variable name="u" select="stroom:parseUri(rURI)" />
<URI>
<xsl:value-of select="rURI" />
</URI>
<URIDetail>
<xsl:copy-of select="$v"/>
</URIDetail>
Returns true if the specified point is inside the specified polygon.
Useful for determining if a user is inside a physical zone based on their location and the boundary of that zone.
pointIsInsideXYPolygon(Number xPos, Number yPos, Number[] xPolyData, Number[] yPolyData)
Arguments:
xPos - The X value of the point to be tested.
yPos - The Y value of the point to be tested.
xPolyData - A sequence of X values that define the polygon.
yPolyData - A sequence of Y values that define the polygon.
The list of values supplied for xPolyData must correspond with the list of values supplied for yPolyData.
The points that define the polygon must be provided in order, i.e. starting from one point on the polygon and then traveling round the path of the polygon until it gets back to the beginning.
3 - XSLT Includes
Using an XSLT import to include XSLT from another translation.
You can use an XSLT import to include XSLT from another translation.
E.g.:
<xsl:import href="ApacheAccessCommon" />
This would include the XSLT from the ApacheAccessCommon translation.