This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Community

The event-logging XML Schema is an open source product whose evolution is enhanced by the community of users and developers that contribute improvements. This section provides the resources for this community, to help with evolving and documenting the Schema.

1 - Developer Guide

This section is intended for developers contributing to the development of the Schema design. It describes how to do some common tasks, such as releasing and building documentation.

1.1 - Contributing

How to make contributions to the event-logging schema.

We love pull requests and we want to make it as easy as possible to contribute changes.

Getting started

  • Make sure you have a GitHub account .
  • Maybe create a GitHub issue . Is this a comment or documentation change? Does an issue already exist? If you need an issue then describe it in as much detail as you can, e.g. step-by-step to reproduce.
  • Fork the repository repository on GitHub.
  • Clone your fork of the repository.
  • Create a branch for your change, probably from the master branch. Please don’t work on master. Convention is for the branch name to include the issue number like so: git checkout -b gh-1234-fix-thing-x.

Making changes

  • New elements should have <xs:annotation> elements attached to them to provide documentation about the element.
  • You should also update the documentation for any changes.
  • The revised schema should be valid XML and successfully validate against the XMLSchema standard.
  • Be very mindful of making breaking changes.
  • New elements should in most cases be optional as not all source systems can provide all data fields. Enforcing mandatory data can be done outside of the schema.

Submitting changes

  • Sign the Contributor Licence Agreement.
  • Push your changes to your fork.
  • Submit a pull request .
  • We’ll look at it pretty soon after it’s submitted, and we aim to respond within one week.

Getting it accepted

Here are some things you can do to make this all smoother:

  • If you think it might be controversial then discuss it with us beforehand, via a GitHub issue.
  • Write a good commit message ).

1.2 - Schema Style Guide

A style guide for developers contributing to the design of the schema.

Breaking changes

When making changes to the Schema be mindful of the impact of those changes on XML documents that were valid against the previous version of the Schema. See Release considerations.

Naming conventions

All elements, attributes and types should be named in UpperCamelCase, e.g. EventSource, GroupComplextType, etc.

Any initialisms that form part of a name can be in all capitals, e.g. MAC in DeviceMACAddressSimpleType.

Groups

Groups should be suffixed with Group, e.g. InstallationGroup.

Simple types

Simple types should be suffixed with SimpleType, e.g. LatitudeSimpleType.

Complex types

Complex types should be suffixed with ComplexType, e.g. PrintComplexType.

Elements vs attributes

Elements are preferred.

Cardinality

Mandatory elements and attributes should be used with caution. In addition to the fact that adding a new mandatory element/attribute means a breaking change it also imposes a requirement for an event to have the data to populate that element/attribute. The Schema has to support events coming from a wide variety of source systems which may only supply minimal information and whose log output is not capable of being changed. For this reason the vast majority of elements are optional to not impose requirements that cannot be met by source system’s data.

Similarly use of constraints should be avoided as these would impose unreasonable restrictions.

Use of anonymous complex types

When adding child content the use of <xs:complexType> elements should be avoided as this causes problems for the auto generated Java code produced for the event-logging Java library. Instead create a new named complex type which will ensure the type is mapped to a Java class for a cleaner Java API .

Order

Top level schema items

When adding new top level elements, e.g. a root element or a complex type, they should be added in alphabetical order and following the following grouping and order of groups:

  1. Root elements
  2. Groups
  3. Complex types
  4. Simple types

Sequences

Elements in a sequence are in no particular order. They should however be ordered in a logical manor with the most frequently used elements first. If the sequence includes a Data element of type evt:DataComplexType then this should be the last element and be optional as it is used for extensibility.

Annotations

The Schema should be self-documented as far as possible with <xs:annotation> elements added to all elements, attributes and types to describe their purpose. These annotations are pulled through into the Javadoc of the event-logging Java library so the more information that can be provided, the better the Java API will be.

Data elements

Data elements of type evt:DataComplexType should be included in most places in the Schema to allow for extensibility of all aspects of the Schema.

1.3 - Release Process

How to release a new version of the schema.

Release considerations

Before releasing a new version of the schema you need to consider the impact of the changes in the new release. A change to the schema can make documents that were valid against the current version no longer valid when validated with the new version. This would be classed as a breaking change.

A non-exhaustive list of examples of changes that are considered breaking with respect to XML documents being validated against the schema:

  • New mandatory xs:element or xs:attribute
  • Removal of an existing xs:element or xs:attribute
  • Removal of an existing xs:choice item
  • Changing an existing xs:element to be mandatory
  • Changing the position of an element within an xs:sequence
  • Changing the position of an attribute within an xs:element
  • Renaming an xs:element or xs:attribute
  • Change to an element/attribute’s enumeration/pattern that would invalidate formally valid values

An indirect breaking change also occurs when the schema is changed in a way that results in a breaking change to the event-logging Java library. The event-logging Java library consists of Java code that is auto-generated from the schema. If for example a complex type in the schema is renamed then this would result in a Java class being renamed and this would be a breaking change for any client systems that use the library.

A non-exhaustive list of examples of changes that are considered breaking with respect to the event-logging Java library are:

  • All changes listed above that are breaking for an XML document
  • Renaming a complex type
  • Renaming a simple type

Version number

The changes made to the schema will dictate the version number. The event-logging schema follows Semantic Versioning , i.e. MAJOR.MINOR.PATCH. A MAJOR change is one that is breaking in terms of XML documents that validate with the schema, e.g. the removal of an element. A MINOR change is a non-breaking structural change, e.g. the addition of an optional element. A PATCH change is a very minor non-structural change, e.g. adding/updating an element’s annotation.

A change that would break the event-logging Java library but not XML documents, e.g. the rename of a complex type, does not require a change to the MAJOR version part. In this instance the event-logging library would be released with its version indicating a MAJOR change.

The version number should be of one of these forms

  • v<MAJOR>.<MINOR>.<PATCH> - e.g. v4.1.2
  • v<MAJOR>.<MINOR>-beta.NN - e.g. v4.2-beta.1

Beta versions can be used when you need to release a version to trial new changes. With a beta version it is accepted that each iteration of the MAJOR.MINOR beta may be breaking with respect to previous iterations of that beta, e.g. one beta iteration adds a new element then a subsequent iteration removes it.

Namespace version

Strictly speaking the namespace version in the schema (e.g. event-logging:3) should be changed for each MAJOR version change, however changing the namespace currently causes significant pain within Stroom so for now the namespace version will stay at 3 and be out of step with the MAJOR version.

Documentation

Any changes to the schema should be fully documented. This includes documentation in the schema in the form of element annotations and changes/additions to the documentation site in ./docs. All of the documentation source files should be applicable to one minor version of the schema, so any additions/changes/removals to the schema should be reflected in the documentation.

The annotations in the schema are used to generate Javadoc in the event-logging Java library so it is key that they are as thorough as possible.

Release branches

To allow us to support multiple versions of the schema there will be a git branch for each minor release, e.g. 4.0, 4.1, 5.0, etc. Due to the way that the documentation is tied to release branch names all releases of the schema MUST be released from the appropriate release branch, e.g. tag/version v4.1.2 from branch 4.1.

Release Steps

  1. Decide what the new version number will be based on the changes made, see Version number above.
  2. Ensure all required changes have been made to event-logging.xsd with any new elements suitably annotated.
    1. Change the version attribute to the intended version number, e.g. 4.1.2
    2. Change the id attribute to the intended version number, e.g. event-logging-v4.1.2
    3. Change the single enumeration in VersionSimplType to match the new version number.
  3. Ensure all required changes have been made to the documentation.
  4. Run the build for a set version number to ensure the CI build will pass, e.g.
    ./gradlew clean build -Pversion=v4.1.2
    When the build runs it will do the following:
    • Validate the version numbers in the master schema file against the gradle version argument.
    • Run the transformer pipelines to generate the various schema varients (including validating them).
    • Compare the generated schemas with the ones from the latest release so you can check what has changed.
  5. Ensure all changes to the schema have been logged with ./log_change.sh.
  6. Add a page to releases section of the documentation for the version being released.
  7. Commit and push all the changes.
  8. Tag the release using ./tag_release.sh which will initiate the CI release process.
  9. Add the new schema to stroom-content
    1. Copy the new schema file into stroom-content/source/event-logging-xml-schema/stroomContent/XML Schemas/event-logging/ naming it something like event-logging v4.1.2.XMLSchema.data.xsd.
    2. Copy the latest .XMLSchema.xml files into one named for the new version, e.g. event-logging v4.1.2.XMLSchema.xml
    3. Edit this new .XMLSchema.xml file:
      1. Update the <name> tag to reflect the new version number
      2. If the major version number has changed update the <namespaceURI> tag with the new major version number
      3. Update the <systemId> tag to reflect the new version number
      4. Replace the content of the <uuid> tag with a newly generated UUID. You can use the linux binary uuidgen to generate a new UUID.
    4. Update the CHANGELOG.md file in stroom-content/source/event-logging-xml-schema/, probably copying the content from the CHANGELOG in the event-logging-schema git repo.
    5. Run the build to build the new pack
    6. Commit and push the changes
    7. In GitHUb create a new release for the updated pack.
  10. Update the version numbers in event-logging.xsd
    1. Change the version attribute to the next intended version number with a SNAPSHOT suffix, e.g. 4.2.0-SNAPSHOT
    2. Change the id attribute to the next intended version number with a SNAPSHOT suffix, e.g. event-logging-v4.2.0-SNAPSHOT
    3. Commit and push the change

1.4 - Schema Variants

A guide to the different variants of the schema and the process that creates them.

The schema has a number of different variants that are all derived from the master event-logging.xsd schema in the root of this repository. The following variants are published as release artefacts:

Suffix Description
- The full schema for use by stroom to validate decorated events.
client The client schema for use by client systems and the event-logging Java library for sending un-decorated events. Lacks some <UserDetails> child elements, adds an <Event> root element to allow clients to send individual events and removes the Event/@Id attribute as this is for use on decorated events only.
safe A version of the schema for validating data from untrusted sources. It limits the number of occurrences of elements and the characters that can be used.

These variants are generated by the java application in event-logging-transformer-main/ which is configured by event-logging-transformer-main/pipelines/configuration.yml.

This Java application is run as part of the Gradle build. The generated schemas are output to event-logging-transformer-main/pipelines/generated/

1.5 - Frequently Asked Questions

How can I check the schema is valid before submitting a pull request?

Run the following (which relies on libxml2-utils):

xmllint --noout --schema http://www.w3.org/2001/XMLSchema.xsd event-logging-vX.X.X.xsd

2 - Documenting the Schema

This section covers the development and maintenance of this documentation site. The documentation is a community effort with contributions from developers and users.

This site is built using Hugo with the Docsy Hugo theme. The content is pre-dominantly authored in Markdown with some Hugo shortcodes.

2.1 - Building the Documentation

How to develop and build the documentation.

Prerequisites

In order to build and contribute to the documentation you will need the following installed:

Docker is required as all the build steps are performed in docker containers to ensure a consistent and known build environment. It also ensures that the local build environment matches that used in GitHub actions.

It is possible to build the docs without docker but you would need to install all the other dependencies that are provided in the docker images, hugo, npm, etc.

Cloning the event-logging-schema git repository

The git repository for this site is event-logging-schema . event-logging-schema uses the Docsy theme (themes/docsy/) which is pulled in via Go modules. To clone the repository:

# Clone the repo
git clone https://github.com/gchq/event-logging-schema.git
(out)Cloning into 'event-logging-schema'...
(out)remote: Enumerating objects: 66006, done.
(out)remote: Counting objects: 100% (7916/7916), done.
(out)remote: Compressing objects: 100% (1955/1955), done.
(out)remote: Total 66006 (delta 3984), reused 7417 (delta 3603), pack-reused 58090
(out)Receiving objects: 100% (66006/66006), 286.61 MiB | 7.31 MiB/s, done.
(out)Resolving deltas: 100% (34981/34981), done.
cd event-logging-schema

Running a local server

The documentation can be built and served locally while developing it. To build and serve the site run

./container_build/runInHugoDocker.sh server

This uses Hugo to build the site in memory and then serve it from a local web server. When any source files are changed or added Hugo will detect this and rebuild the site as required, including automatically refreshing the browser page to update the rendered view.

Once the server is running the site is available at localhost:1313 .

Building the site locally

To perform a full build of the static site run:

./container_build/runInHugoDocker.sh build

This will generate all the static content and place it in public/.

Generating the PDF

Every page has a Print entire section link that will display a printable view of that section and its children. In addition to this the GitHub Actions we generate a PDF of the docs section and all its children, i.e. all of the documentation (but not News/Releases or Community) in one PDF. This makes the documentation available for offline use.

To test the PDF generation do:

./container_build/runInPupeteerDocker.sh PDF

Updating the Docsy theme

The Docsy theme is pulled in as a Go module. To update the version of Docsy used see Update the Docsy Hugo Module .

When these instructions say to run the hugo command you need to run them using the builder container, unless you have Hugo and Go installed locally. e.g.

./container_build/runInHugoDocker.sh "hugo mod get -u github.com/google/docsy@v0.2.0"

2.2 - Schema Versions

How to manage documentation for different versions of the Schema.

The Docsy theme supports site versioning so that multiple versions of the site/documentation can exist and link between each other. For this documentation site, each version of the site is tied to a minor release of the Schema, e.g. 4.0, 4.1, 4.2, 5.0 etc. Each Schema version is represented by a git branch with the same name (without the v prefix). Documentation changes for an as yet unreleased Stroom version would be performed on the master branch.

When the combined site is built, each version will exist within a directory as siblings of each other, i.e.

/4.0/
/4.1/
/4.2/
/5.0/

The master branch is NOT published to GitHub Pages or included in the release artefacts.

Versioned Site Configuration

To configure each version of the site so that it knows what version it is and what the other versions are you need to edit config.toml. This needs to be done on each branch in a way that is appropriate to each branch. If a change needs to be applied to all branches then it is best to make it in the oldest branch for which the documentation is published and then merged the changes up the chain, e.g. legacy => 4.0 => 4.1 => 4.2 => 5.0 => master.

The following config properties needed to be amended on each branch. This example is from the 4.1 branch and is based on there being versions 4.0 and 4.1, with 4.1 being the latest.

4.1

[params]
  # Menu title if your navigation bar has a versions selector
  # to access old versions of your site.
  version_menu = "Schema 4.1"

  # If true, displays a banner on each page warning that
  # it is an old version. Set this to true on each git branch
  # of stroom-docs that is not the latest release branch
  archived_version = false

  # Used in the banner on each archived page.
  # Must match the value in brackets in "version_menu" above
  version = "4.1"

  # A link to latest version of the docs. Used in the
  # "version-banner" partial to point people to the main
  # doc site.
  url_latest_version = "/../4.1"

  # The name of the Github branch that this version of the
  # documentation lives on. Used for the Github links in the
  # top of the right hand sidebar. Should match the last part
  # of url_latest_version.
  github_branch = "4.1"

  # A set of all the versions that are available.
  [[params.versions]]
    version = "4.1"
    url = "/../4.1"
  [[params.versions]]
    version = "4.0"
    url = "/../4.0"

4.0

[params]
  # Menu title if your navigation bar has a versions selector
  # to access old versions of your site.
  version_menu = "Schema 4.0"

  # If true, displays a banner on each page warning that
  # it is an old version. Set this to true on each git branch
  # of stroom-docs that is not the latest release branch
  archived_version = true

  # Used in the banner on each archived page.
  # Must match the value in brackets in "version_menu" above
  version = "4.0"

  # A link to latest version of the docs. Used in the
  # "version-banner" partial to point people to the main
  # doc site.
  url_latest_version = "/../4.1"

  # The name of the GitHub branch that this version of the
  # documentation lives on. Used for the GitHub links in the
  # top of the right hand sidebar. Should match the last part
  # of url_latest_version.
  github_branch = "4.0"

  # A set of all the versions that are available.
  [[params.versions]]
    version = "4.1"
    url = "/../4.1"
  [[params.versions]]
    version = "4.0"
    url = "/../4.0"

In the same example scenario as above, the config.toml file for the 4.0 branch (which is not the latest version in this case) would be:

Automated build process

The site is built by GitHub Actions on a nightly basis. This schedule is controlled by build_and_release.yml on the master branch.

This automated build will look for any branches matching the pattern (legacy|[0-9]+\.[0-9]+) and for each one will do the following:

  • Checkout that branch
  • Build the site for that version using Hugo
    • Add the site files to a combined site
    • Generate the documentation PDF
  • Build the site with no other versions configured
    • Create a zip of the single version site

Once each site has been processed it will:

  • Create a single zip file containing the combined site
  • Tag the release with a version number
  • Add the following release artefacts:
    • Single version site zips
    • Combined site zip
    • Single version PDFs
  • Create a root index.hml file that will redirect to the latest version sub-directory.
  • Publish the combined site to GitHub Pages https://gchq.github.io/stroom-docs .

Although the build is run on the master branch it will use the HEAD commit of each of the release branches to build the site(s).

The build and release can be forced by adding the text [publish] to the commit message on master. This is useful in testing, or if updated documentation is needed for any reason.

Where to make changes

The nature of a change to the site/documentation will determine which git branch the change is made on.

Changes specific to a Stroom version

Any changes that are specific to a Stroom version, e.g. documenting a new feature in that version should be made on the oldest branch that contains that feature. If the change relates to an as yet unreleased version of Stroom then make the change on master.

Changes to the News/Releases

Adding news items or release notes for new versions should be done on the latest release branch. The News/Releases section is not included in old versions when released.

Changing the site look

Ideally changes to the look of the site, e.g. upgrading the Docsy theme sub-module to a new commit, adding shortcodes or tweaking the CSS should be done on all branches so when switching between branches the look doesn’t change. This means this sort of change should be done on the oldest published version branch and then merged up the chain to the others, e.g. 4.0 => 4.1 => master.

In some cases a change to the look may require significant refactoring of the content, e.g. changes to a shortcode. In the event of this it may be necessary to only make the change on the latest release branch and for different versions to have a slightly different look. The decision on how best to tackle these situations will have to be on a case by case basis.

Building a mock multi-version site

To make it easier to test how the combined site will look with multiple versions the following script can be run to mock up a multi-version site. It does the following:

  1. Copies the content of the local repository.
  2. Amends the config file to set appropriate versions.
  3. Builds the site for that version.
  4. Copies the built site into a sub-directory matching the version in /tmp/stroom-docs_mock_combined_site/.

To run this script do:

create_mock_combined_site.sh

The combined site can be served using something like the Python simple HTTP server, e.g.

cd /tmp/stroom-docs_mock_combined_site
python -m SimpleHTTPServer 8888

Then open a browser at localhost:8888 .

As each version of the site is a copy of the same thing the content will be all the same but it allows you to test the version drop down and archived banner.

2.3 - Documentation Style Guide

A guide on the house style, structure and content for this site.

Overview

This documentation shares the same style guidelines as the Stroom documenation so refer to that for guidance on how to style this documentation.