The page that you are currently viewing is for an old version of Stroom (Legacy). The documentation for the latest version of Stroom (7.6) can be found using the version drop-down at the top of the screen or by clicking here.

Installation

This HOWTO is provided to assist users in setting up a number of different Stroom environments based on Centos 7.3 infrastructure.

Assumptions

The following assumptions are used in this document.

  • the user has reasonable RHEL/Centos System administration skills.
  • installations are on Centos 7.3 minimal systems (fully patched).
  • the term ’node’ is used to reference the ‘host’ a service is running on.
  • the Stroom Proxy and Application software runs as user ‘stroomuser’ and will be deployed in this user’s home directory
  • data will reside in a directory tree referenced via ‘/stroomdata’. It is up to the user to provision a filesystem here, noting sub-directories of it will be NFS shared in Multi Node Stroom Deployments
  • any scripts or commands that should run are in code blocks and are designed to allow the user to cut then paste the commands onto their systems
  • in this document, when a textual screen capture is documented, data entry is identified by the data surrounded by ‘<’ ‘>’ . This excludes enter/return presses.
  • better security of password choices, networking, firewalls, data stores, etc. can and should be achieved in various ways, but these HOWTOs are just a quick means of getting a working system, so only limited security is applied
  • better configuration of the database (e.g. more memory. redundancy) should be considered in production environments
  • the use of self signed certificates is appropriate for test systems, but users should consider appropriate CA infrastructure in production environments
  • the user has access to a Chrome (external link) web browser as Stroom is optimised for this browser.

Introduction

This HOWTO provides guidance on a variety of simple Stroom deployments.

for an environment where multiple nodes are required to handle the processing load.

for extensive networks where one wants to aggregate data through a proxy before sending data to the central Stroom processing systems.

for disconnected networks where collected data can be manually transferred to a Stroom processing service.

for when one needs to add an additional node to an existing cluster.

Nodename Nomenclature

For simplicity sake, the nodenames used in this HOWTO are geared towards the Multi Node Stroom Cluster deployment. That is,

  • the database nodename is stroomdb0.strmdev00.org
  • the processing nodenames are stroomp00.strmdev00.org, stroomp01.strmdev00.org, and stroomp02.strmdev00.org
  • the first node in our cluster, stroomp00.strmdev00.org, also has the CNAME stroomp.strmdev00.org

In the case of the Proxy only deployments,

  • the forwarding Stroom proxy nodename is stoomfp0.strmdev00.org
  • the standalone nodename will be stroomp00.strmdev00.org

Storage

Both the Stroom Proxy and Application store data. The typical requirement is

  • directory for Stroom proxy to store inbound data files
  • directory for Stroom application permanent data files (events, etc.)
  • directory for Stroom application index data files
  • directory for Stroom application working files (temporary files, output, etc.)

Where multiple processing nodes are involved, the application’s permanent data directories need to be accessible by all participating nodes.

Thus a hierarchy for a Stroom Proxy might by

  • /stroomdata/stroom-proxy

and for an Application node

  • /stroomdata/stroom-data
  • /stroomdata/stroom-index
  • /stroomdata/stroom-working

In the following examples, the storage hierarchy proposed will more suited for a multi node Stroom cluster, including the Forwarding or Standalone proxy deployments. This is to simplify the documentation. Thus, the above structure is generalised into

  • /stroomdata/stroom-working-p_nn_/proxy

and

  • /stroomdata/stroom-data-p_nn_
  • /stroomdata/stroom-index-p_nn_
  • /stroomdata/stroom-working-p_nn_

where nn is a two digit node number. The reason for placing the proxy directory within the Application working area will be explained later.

All data should be owned by the Stroom processing user. In this HOWTO, we will use stroomuser

Multi Node Stroom Cluster (Proxy and Application) Deployment

In this deployment we will install the database on a given node then deploy both the Stroom Proxy and Stroom Application software to both our processing nodes. At this point we will then integrate a web service to run ‘in-front’ of our Stroom software and then perform the initial configuration of Stroom via the user interface.

Database Installation

The Stroom capability requires access to two MySQL/MariaDB databases. The first is for persisting application configuration and metadata information, and the second is for the Stroom Statistics capability. Instructions for installation of the Stroom databases can be found here. Although these instructions describe the deployment of the databases to their own node, there is no reason why one can’t just install them both on the first (or only) Stroom node.

Prerequisite Software Installation

Certain software packages are required for either the Stroom Proxy or Stroom Application to run.

The core software list is

  • java-1.8.0-openjdk
  • java-1.8.0-openjdk-devel
  • policycoreutils-python
  • unzip
  • zip
  • mariadb or mysql client

Most of the required software are packages available via standard repositories and hence we can simply execute

sudo yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel policycoreutils-python unzip zip

One has a choice of database clients. MariaDB is directly supported by Centos 7 and is simplest to install. This is done via

sudo yum -y install mariadb

One could deploy the MySQL database software as the alternative.

To do this you need to install the MySQL Community repository files then install the client. Instructions for installation of the MySQL Community repository files can be found here or on the MySQL Site (external link). Once you have installed the MySQL repository files, install the client via

sudo yum -y install mysql-community-client

Note that additional software will be required for other integration components (e.g. Apache httpd/mod_jk). This is described in the Web Service Integration section of this document.

Note also, that Standalone or Forwarding Stroom Proxy deployments do NOT need a database client deployed.

Entropy Issues in Virtual environments

Both the Stroom Application and Stroom Proxy currently run on Tomcat (Version 7) which relies on the Java SecureRandom class to provide random values for any generated session identifiers as well as other components. In some circumstances the Java runtime can be delayed if the entropy source that is used to initialise SecureRandom is short of entropy. The delay is caused by the Java runtime waiting on the blocking entropy souce /dev/random to have sufficient entropy. This quite often occurs in virtual environments were there are few sources that can contribute to a system’s entropy.

To view the current available entropy on a Linux system, run the command

cat /proc/sys/kernel/random/entropy_avail

A reasonable value would be over 2000 and a poor value would be below a few hundred.

If you are deploying Stroom onto systems with low available entropy, the start time for the Stroom Proxy can be as high as 5 minutes and for the Application as high as 15 minutes.

One software based solution would be to install the haveged (external link) service that attempts to provide an easy-to-use, unpredictable random number generator based upon an adaptation of the HAVEGE algorithm. To install execute

yum -y install haveged
systemctl enable haveged
systemctl start haveged

For background reading in this matter, see this reference (external link) or this reference (external link).

Storage Scenario

For the purpose of this Installation HOWTO, the following sets up the storage hierarchy for a two node processing cluster. To share our permanent data we will use NFS. Accept that the NFS deployment described here is very simple, and in a production deployment, a lot more security controls should be used. Further,

Our hierarchy is

  • Node: stroomp00.strmdev00.org
  • /stroomdata/stroom-data-p00 - location to store Stroom application data files (events, etc.) for this node
  • /stroomdata/stroom-index-p00 - location to store Stroom application index files
  • /stroomdata/stroom-working-p00 - location to store Stroom application working files (e.g. temporary files, output, etc.) for this node
  • /stroomdata/stroom-working-p00/proxy - location for Stroom proxy to store inbound data files
  • Node: stroomp01.strmdev00.org
  • /stroomdata/stroom-data-p01 - location to store Stroom application data files (events, etc.) for this node
  • /stroomdata/stroom-index-p01 - location to store Stroom application index files
  • /stroomdata/stroom-working-p01 - location to store Stroom application working files (e.g. temporary files, output, etc.) for this node
  • /stroomdata/stroom-working-p01/proxy - location for Stroom proxy to store inbound data files

Creation of Storage Hierarchy

So, we first create processing user on all nodes as per

sudo useradd --system stroomuser

And the relevant commands to create the above hierarchy would be

  • Node: stroomp00.strmdev00.org
sudo mkdir -p /stroomdata/stroom-data-p00 /stroomdata/stroom-index-p00 /stroomdata/stroom-working-p00 /stroomdata/stroom-working-p00/proxy
sudo mkdir -p /stroomdata/stroom-data-p01  # So that this node can mount stroomp01's data directory
sudo chown -R stroomuser:stroomuser /stroomdata
sudo chmod -R 750 /stroomdata
  • Node: stroomp01.strmdev00.org
sudo mkdir -p /stroomdata/stroom-data-p01 /stroomdata/stroom-index-p01 /stroomdata/stroom-working-p01 /stroomdata/stroom-working-p01/proxy
sudo mkdir -p /stroomdata/stroom-data-p00  # So that this node can mount stroomp00's data directory
sudo chown -R stroomuser:stroomuser /stroomdata
sudo chmod -R 750 /stroomdata

Deployment of NFS to share Stroom Storage

We will use NFS to cross mount the permanent data directories. That is

  • node stroomp00.strmdev00.org will mount stroomp01.strmdev00.org:/stroomdata/stroom-data-p01 and,
  • node stroomp01.strmdev00.org will mount stroomp00.strmdev00.org:/stroomdata/stroom-data-p00.

The HOWTO guide to deploy and configure NFS for our Scenario is here

Stroom Installation

Pre-installation setup

Before installing either the Stroom Proxy or Stroom Application, we need establish various files and scripts within the Stroom Processing user’s home directory to support the Stroom services and their persistence. This is setup is described here.

Stroom Proxy Installation

Instructions for installation of the Stroom Proxy can be found here.

Stroom Application Installation

Instructions for installation of the Stroom application can be found here.

Web Service Integration

One typically ‘fronts’ either a Stroom Proxy or Stroom Application with a secure web service such as Apache’s Httpd or NGINX. In our scenario, we will use SSL to secure the web service and further, we will use Apache’s Httpd.

We first need to create certificates for use by the web service. The following provides instructions for this. The created certificates can then be used when configuration the web service.

This HOWTO is designed to deploy Apache’s httpd web service as a front end (https) (to the user) and Apache’s mod_jk as the interface between Apache and the Stroom tomcat applications. The instructions to configure this can be found here.

Other Web service capability can be used, for example, NGINX (external link).

Installation Validation

We will now check that the installation and web services integration has worked.

Sanity firewall check

To ensure you have the firewall correctly set up, the following command

sudo firewall-cmd --reload
sudo firewall-cmd --zone=public --list-all

should result in

public (active)
  target: default
  icmp-block-inversion: no
  interfaces: enp0s3
  sources: 
  services: dhcpv6-client http https nfs ssh
  ports: 8009/tcp 9080/tcp 8080/tcp 9009/tcp
  protocols: 
  masquerade: no
  forward-ports: 
  sourceports: 
  icmp-blocks: 
  rich rules: 

Test Posting of data to the Stroom service

You can test the data posting service with the command

curl -k --data-binary @/etc/group "https://stroomp.strmdev00.org/stroom/datafeed" -H "Feed:TEST-FEED-V1_0" -H "System:EXAMPLE_SYSTEM" -H "Environment:EXAMPLE_ENVIRONMENT"

which WILL result in an error as we have not configured the Stroom Application as yet. The error should look like

<html><head><title>Apache Tomcat/7.0.53 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 406 - Stroom Status 110 - Feed is not set to receive data - </h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>Stroom Status 110 - Feed is not set to receive data - </u></p><p><b>description</b> <u>The resource identified by this request is only capable of generating responses with characteristics not acceptable according to the request "accept" headers.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/7.0.53</h3></body></html>

If you view the Stroom proxy log, ~/stroom-proxy/instance/logs/stroom.log, on both processing nodes, you will see on one node, the datafeed.DataFeedRequestHandler events running under, in this case, the ajp-apr-9009-exec-1 thread indicating the failure

...
2017-01-03T03:35:47.366Z WARN  [ajp-apr-9009-exec-1] datafeed.DataFeedRequestHandler (DataFeedRequestHandler.java:131) - "handleException()","Environment=EXAMPLE_ENVIRONMENT","Expect=100-continue","Feed=TEST-FEED-V1_0","GUID=39960cf9-e50b-4ae8-a5f2-449ee670d2eb","ReceivedTime=2017-01-03T03:35:46.915Z","RemoteAddress=192.168.2.220","RemoteHost=192.168.2.220","System=EXAMPLE_SYSTEM","accept=*/*","content-length=1051","content-type=application/x-www-form-urlencoded","host=stroomp.strmdev00.org","user-agent=curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.21 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2","Stroom Status 110 - Feed is not set to receive data"
2017-01-03T03:35:47.367Z ERROR [ajp-apr-9009-exec-1] zip.StroomStreamException (StroomStreamException.java:131) - sendErrorResponse() - 406 Stroom Status 110 - Feed is not set to receive data - 
2017-01-03T03:35:47.368Z INFO  [ajp-apr-9009-exec-1] datafeed.DataFeedRequestHandler$1 (DataFeedRequestHandler.java:104) - "doPost() - Took 478 ms to process (concurrentRequestCount=1) 406","Environment=EXAMPLE_ENVIRONMENT","Expect=100-continue","Feed=TEST-FEED-V1_0","GUID=39960cf9-e50b-4ae8-a5f2-449ee670d2eb","ReceivedTime=2017-01-03T03:35:46.915Z","RemoteAddress=192.168.2.220","RemoteHost=192.168.2.220","System=EXAMPLE_SYSTEM","accept=*/*","content-length=1051","content-type=application/x-www-form-urlencoded","host=stroomp.strmdev00.org","user-agent=curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.21 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
...

Further, if you execute the data posting command (curl) multiple times, you will see the loadbalancer working in that, the above WARN/ERROR/INFO logs will swap between the proxy services (i.e. first error will be in stroomp00.strmdev00.org’s proxy log file, then second on stroomp01.strmdev00.org’s proxy log file, then back to stroomp00.strmdev00.org and so on).

Stroom Application Configuration

Although we have installed our multi node Stroom cluster, we now need to configure it. We do this via the user interface (UI).

Logging into the Stroom UI for the first time

To log into the UI of your newly installed Stroom instance, present the base URL to your Chrome (external link) browser. In this deployment, you should enter the URLS http://stroomp.strmdev00.org, or https://stroomp.strmdev00.org or https://stroomp.strmdev00.org/stroom, noting the first URLs should automatically direct you to the last URL.

If you have personal certificates loaded in your Chrome browser, you may be asked which certificate to use to authenticate yourself to stroomp.strmdev00.org:443. As Stroom has not been configured to use user certificates, the choice is not relevant, just choose one and continue.

Additionally, if you are using self-signed certificates, your browser will generate an alert as per

images/HOWTOs/UI-Chrome-NoCa-00.png

Self Signed Certificate Initial Warning

To proceed you need to select the ADVANCED hyperlink to see

images/HOWTOs/UI-Chrome-NoCa-01.png

Self Signed Certificate Advanced Warning

If you select the Proceed to stroomp.strmdev00.org (unsafe) hyper-link you will be presented with the standard Stroom UI login page.

images/HOWTOs/UI-Login-00.png

Stroom UI Login Page

This page has two panels - About Stroom and Login.

In the About Stroom panel we see an introductory description of Stroom in the top left and deployment details in the bottom left of the panel. The deployment details provide

  • Build Version: - the build version of the Stroom application deployed
  • Build Date: - the date the version was built
  • Up Date: - the install date
  • Node Name: - the node within the Stroom cluster you have connected to

Login with Stroom default Administrative User

Each new Stroom deployment automatically creates the administrative user admin and this user’s password is initially set to admin. We will login as this user which also validates that the database and UI is working correctly in that you can login and the password is admin.

Create an Attributed User to perform configuration

We should configure Stroom using an attributed user account. That is, we should create a user, in our case it will be burn (the author) and once created, we login with that account then perform the initial configuration activities. You don’t have to do this, but it is sound security practice.

Once you have created the user you should log out of the admin account and log back in as our user burn.

Configure the Volumes for our Stroom deployment

Before we can store data within Stroom we need to configure the volumes we have allocated in our Storage hierarchy. The Volume Maintenance HOWTO shows how to do this.

Configure the Nodes for our Stroom deployment

In a Stroom cluster, nodes are expected to communicate with each other on port 8080 over http. Our installation in a multi node environment ensures the firewall will allow this but we also need to configure the nodes. This is achieved via the Stroom UI where we set a Cluster URL for each node. The following Node Configuration HOWTO demonstrates how do set the Cluster URL.

Data Stream Processing

To enable Stroom to process data, it’s Data Processors need to be enabled. There are NOT enabled by default on installation. The following section in our Stroom Tasks HowTo shows how to do this.

Testing our Stroom Application and Proxy Installation

To complete the installation process we will test that we can send and ingest data.

Add a Test Feed

In order for Stroom to be able to handle various data sources, be they Apache HTTPD web access logs, MicroSoft Windows Event logs or Squid Proxy logs, Stroom must be told what the data is when it is received. This is achieved using Event Feeds. Each feed has a unique name within the system.

To test our installation can accept and ingest data, we will create a test Event feed. The ’name’ of the feed will be TEST-FEED-V1_0. Note that in a production environment is is best that a well defined nomenclature is used for feed ’names’. For our testing purposes TEST-FEED-V1_0 is sufficient.

Sending Test Data

NOTE: Before testing our new feed, we should restart both our Stroom application services so that any volume changes are propagated. This can be achieved by simply running

sudo -i -u stroomuser
bin/StopServices.sh
bin/StartServices.sh

on both nodes. It is suggested you first log out of Stroom, if you are currently logged in and you should monitor the Stroom application logs to ensure it has successfully restarted. Remember to use the T and Tp bash aliases we set up.

For this test, we will send the contents of /etc/group to our test feed. We will also send the file from the cluster’s database machine. The command to send this file is

curl -k --data-binary @/etc/group "https://stroomp.strmdev00.org/stroom/datafeed" -H "Feed:TEST-FEED-V1_0" -H "System:EXAMPLE_SYSTEM" -H "Environment:EXAMPLE_ENVIRONMENT"

We will test a number of features as part of our installation test. These are

  • simple post of data
  • simple post of data to validate load balancing is working
  • simple post to direct feed interface
  • simple post to direct feed interface to validate load balancing is working
  • identify that the Stroom Proxy Aggregation is working correctly

As part of our testing will check the presence of the inbound data, as files, within the proxy storage area. Now as the proxy storage area is also the location from which the Stroom application automatically aggregates then ingests the data stored by the proxy, we can either turn off the Proxy Aggregation task, or attempt to perform our tests noting that proxy aggregation occurs every 10 minutes by default. For simplicity, we will turn off the Proxy Aggregation task.

We can now perform out tests. Follow the steps in the Data Posting Tests section of the Testing Stroom Installation HOWTO

Forwarding Stroom Proxy Deployment

In this deployment will install a Stroom Forwarding Proxy which is designed to aggregate data posted to it for managed forwarding to a central Stroom processing system. This scenario is assuming we are installing on the fully patch Centos 7.3 host, stroomfp0.strmdev00.org. Further it assumes we have installed, configured and tested the destination Stroom system we will be forwarding to.

We will first deploy the Stroom Proxy then configure it as a Forwarding Proxy then integrate a web service to run ‘in-front’ of Proxy.

Prerequisite Software Installation for Forwarding Proxy

Certain software packages are required for the Stroom Proxy to run.

The core software list is

  • java-1.8.0-openjdk
  • java-1.8.0-openjdk-devel
  • policycoreutils-python
  • unzip
  • zip

Most of the required software are packages available via standard repositories and hence we can simply execute

sudo yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel policycoreutils-python unzip zip

Note that additional software will be required for other integration components (e.g. Apache httpd/mod_jk). This is described in the Web Service Integration for Forwarding Proxy section of this document.

Forwarding Proxy Storage

Since we are a proxy that stores data sent to it and forwards it each minute we have only one directory.

  • /stroomdata/stroom-working-fp0/proxy - location for Stroom proxy to store inbound data files prior to forwarding

You will note that these HOWTOs use a consistent storage nomenclature for simplicity of documentations.

Creation of Storage for Forwarding Proxy

We create the processing user, as per

sudo useradd --system stroomuser

then create the storage hierarchy with the commands

sudo mkdir -p /stroomdata/stroom-working-fp0/proxy
sudo chown -R stroomuser:stroomuser /stroomdata
sudo chmod -R 750 /stroomdata

Stroom Forwarding Proxy Installation

Pre-installation setup

Before installing the Stroom Forwarding Proxy, we need establish various files and scripts within the Stroom Processing user’s home directory to support the Stroom services and their persistence. This is setup is described here. Although this setup HOWTO is orientated towards a complete Stroom Proxy and Application installation, it does provide all the processing user setup requirements for a Stroom Proxy as well.

Stroom Forwarding Proxy Installation

Instructions for installation of the Stroom Proxy can be found here, noting you should follow the steps for configuring the proxy as a Forwarding proxy.

Web Service Integration for Forwarding Proxy

One typically ‘fronts’ a Stroom Proxy with a secure web service such as Apache’s Httpd or NGINX. In our scenario, we will use SSL to secure the web service and further, we will use Apache’s Httpd.

We first need to create certificates for use by the web service. The SSL Certificate Generation HOWTO provides instructions for this. The created certificates can then be used when configuration the web service. NOTE also, that for a forwarding proxy we will need to establish Key and Trust stores as well. This is also documented in the SSL Certificate Generation HOWTO here

This HOWTO is designed to deploy Apache’s httpd web service as a front end (https) (to the user) and Apache’s mod_jk as the interface between Apache and the Stroom tomcat applications. The instructions to configure this can be found here. Please take note of where a Stroom Proxy configuration item is different to that of a Stroom Application processing node.

Other Web service capability can be used, for example, NGINX (external link).

Testing our Forwarding Proxy Installation

To complete the installation process we will test that we can send data to the forwarding proxy and that it forwards the files it receives to the central Stroom processing system. As stated earlier, it is assumed we have installed, configured and tested the destination central Stroom processing system and thus we will have a test Feed already established - TEST-FEED-V1_0.

Sending Test Data

For this test, we will send the contents of /etc/group to our test feed - TEST-FEED-V1_0. It doesn’t matter from which host we send the file from. The command to send file is

curl -k --data-binary @/etc/group "https://stroomfp0.strmdev00.org/stroom/datafeed" -H "Feed:TEST-FEED-V1_0" -H "System:EXAMPLE_SYSTEM" -H "Environment:EXAMPLE_ENVIRONMENT"

Before testing, it is recommended you set up to monitor the Stroom proxy logs on the central server as well as on the Forwarding Proxy server.

Follow the steps in the Forwarding Proxy Data Posting Tests section of the Testing Stroom Installation HOWTO

Standalone Stroom Proxy Deployment

In this deployment will install a Stroom Standalone Proxy which is designed to accept and store data posted to it for manual forwarding to a central Stroom processing system. This scenario is assuming we are installing on the fully patch Centos 7.3 host, stroomsap0.strmdev00.org.

We will first deploy the Stroom Proxy then configure it as a Standalone Proxy then integrate a web service to run ‘in-front’ of Proxy.

Prerequisite Software Installation for Forwarding Proxy

Certain software packages are required for the Stroom Proxy to run.

The core software list is

  • java-1.8.0-openjdk
  • java-1.8.0-openjdk-devel
  • policycoreutils-python
  • unzip
  • zip

Most of the required software are packages available via standard repositories and hence we can simply execute

sudo yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel policycoreutils-python unzip zip

Note that additional software will be required for other integration components (e.g. Apache httpd/mod_jk). This is described in the Web Service Integration for Standalone Proxy section of this document.

Standalone Proxy Storage

Since we are a proxy that stores data sent to it we have only one directory.

  • /stroomdata/stroom-working-sap0/proxy - location for Stroom proxy to store inbound data files

You will note that these HOWTOs use a consistent storage nomenclature for simplicity of documentations.

Creation of Storage for Standalone Proxy

We create the processing user, as per

sudo useradd --system stroomuser

then create the storage hierarchy with the commands

sudo mkdir -p /stroomdata/stroom-working-sap0/proxy
sudo chown -R stroomuser:stroomuser /stroomdata
sudo chmod -R 750 /stroomdata

Stroom Standalone Proxy Installation

Pre-installation setup

Before installing the Stroom Standalone Proxy, we need establish various files and scripts within the Stroom Processing user’s home directory to support the Stroom services and their persistence. This is setup is described here. Although this setup HOWTO is orientated towards a complete Stroom Proxy and Application installation, it does provide all the processing user setup requirements for a Stroom Proxy as well.

Stroom Standalone Proxy Installation

Instructions for installation of the Stroom Proxy can be found here, noting you should follow the steps for configuring the proxy as a Store_NoDB proxy.

Web Service Integration for Standalone Proxy

One typically ‘fronts’ a Stroom Proxy with a secure web service such as Apache’s Httpd or NGINX. In our scenario, we will use SSL to secure the web service and further, we will use Apache’s Httpd.

We first need to create certificates for use by the web service. The SSL Certificate Generation HOWTO provides instructions for this. The created certificates can then be used when configuration the web service. There is no need for Trust or Key stores.

This HOWTO is designed to deploy Apache’s httpd web service as a front end (https) (to the user) and Apache’s mod_jk as the interface between Apache and the Stroom tomcat applications. The instructions to configure this can be found here. Please take note of where a Stroom Proxy configuration item is different to that of a Stroom Application processing node.

Other Web service capability can be used, for example, NGINX (external link).

Testing our Standalone Proxy Installation

To complete the installation process we will test that we can send data to the standalone proxy and it stores it.

Sending Test Data

For this test, we will send the contents of /etc/group to our test feed - TEST-FEED-V1_0. It doesn’t matter from which host we send the file from. The command to send file is

curl -k --data-binary @/etc/group "https://stroomsap0.strmdev00.org/stroom/datafeed" -H "Feed:TEST-FEED-V1_0" -H "System:EXAMPLE_SYSTEM" -H "Environment:EXAMPLE_ENVIRONMENT"

Before testing, it is recommended you set up to monitor the Standalone Proxy logs.

Follow the steps in the Standalone Proxy Data Posting Tests section of the Testing Stroom Installation HOWTO

Addition of a Node to a Stroom Cluster Deployment

In this deployment we will deploy both the Stroom Proxy and Stroom Application software to a new processing node we wish to add to our cluster. Once we have deploy and configured the Stroom software, we will then integrate a web service to run ‘in-front’ of our Stroom software, and then perform the initial configuration of to add this node via the user interface. The node we will add is stroomp02.strmdev00.org.

Grant access to the database for this node

Connect to the Stroom database as the administrative (root) user, via the command

sudo mysql --user=root -p

and at the MariaDB [(none)]> or mysql> prompt enter

grant all privileges on stroom.* to stroomuser@stroomp02.strmdev00.org identified by 'Stroompassword1@';
quit;

Prerequisite Software Installation

Certain software packages are required for either the Stroom Proxy or Stroom Application to run.

The core software list is

  • java-1.8.0-openjdk
  • java-1.8.0-openjdk-devel
  • policycoreutils-python
  • unzip
  • zip
  • mariadb or mysql client

Most of the required software are packages available via standard repositories and hence we can simply execute

sudo yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel policycoreutils-python unzip zip
sudo yum -y install mariadb

In the above instance, the database client choice is MariaDB as it is directly supported by Centos 7. One could deploy the MySQL database software as the alternative. If you have chosen a different database for the already deployed Stroom Cluster then you should use that one. See earlier in this document on how to install the MySQL Community client.

Note that additional software will be required for other integration components (e.g. Apache httpd/mod_jk). This is described in the Web Service Integration section of this document.

Storage Scenario

To maintain our Storage Scenario them, the scenario for this node is

  • Node: stroomp02.strmdev00.org
  • /stroomdata/stroom-data-p02 - location to store Stroom application data files (events, etc.) for this node
  • /stroomdata/stroom-index-p02 - location to store Stroom application index files
  • /stroomdata/stroom-working-p02 - location to store Stroom application working files (e.g. tmp, output, etc.) for this node
  • /stroomdata/stroom-working-p02/proxy - location for Stroom proxy to store inbound data files

Creation of Storage Hierarchy

So, we first create processing user on our new node as per

sudo useradd --system stroomuser

then create the storage via

sudo mkdir -p /stroomdata/stroom-data-p02 /stroomdata/stroom-index-p02 /stroomdata/stroom-working-p02 /stroomdata/stroom-working-p02/proxy
sudo mkdir -p /stroomdata/stroom-data-p00  # So that this node can mount stroomp00's data directory
sudo mkdir -p /stroomdata/stroom-data-p01  # So that this node can mount stroomp01's data directory
sudo chown -R stroomuser:stroomuser /stroomdata
sudo chmod -R 750 /stroomdata

As we need to share this new nodes permanent data directories to the existing nodes in the Cluster, we need to create mount point directories on our existing nodes in addition to deploying NFS.

So we execute on

  • Node: stroomp00.strmdev00.org
sudo mkdir -p /stroomdata/stroom-data-p02
sudo chmod 750 /stroomdata/stroom-data-p02
sudo chown stroomuser:stroomuser /stroomdata/stroom-data-p02

and on

  • Node: stroomp01.strmdev00.org
sudo mkdir -p /stroomdata/stroom-data-p02
sudo chmod 750 /stroomdata/stroom-data-p02
sudo chown stroomuser:stroomuser /stroomdata/stroom-data-p02

Deployment of NFS to share Stroom Storage

We will use NFS to cross mount the permanent data directories. That is

  • node stroomp00.strmdev00.org will mount
    • stroomp01.strmdev00.org:/stroomdata/stroom-data-p01 and,
    • stroomp02.strmdev00.org:/stroomdata/stroom-data-p02 and,
  • node stroomp01.strmdev00.org will mount
    • stroomp00.strmdev00.org:/stroomdata/stroom-data-p00 and
    • stroomp02.strmdev00.org:/stroomdata/stroom-data-p02
  • node stroomp02.strmdev00.org will mount
    • stroomp00.strmdev00.org:/stroomdata/stroom-data-p00 and
    • stroomp01.strmdev00.org:/stroomdata/stroom-data-p01

The HOWTO guide to deploy and configure NFS for our Scenario is here.

Stroom Installation

Pre-installation setup

Before installing either the Stroom Proxy or Stroom Application, we need establish various files and scripts within the Stroom Processing user’s home directory to support the Stroom services and their persistence. This is setup is described here. Note you should remember to set the N bash variable when generating the Environment Variable files to 02.

Stroom Proxy Installation

Instructions for installation of the Stroom Proxy can be found here. Note you will be deploying a Store proxy and during the setup execution ensure you enter the appropriate values for NODE (‘stroomp02’) and REPO_DIR (’/stroomdata/stroom-working-p02/proxy’). All other values will be the same.

Stroom Application Installation

Instructions for installation of the Stroom application can be found here. When executing the setup script ensure you enter the appropriate values for TEMP_DIR (’/stroomdata/stroom-working-p02’) and NODE (‘stroomp02’). All other values will be the same. Note also that you will not have to wait for the ‘first’ node to initialise the Stroom database as this would have already been done when you first deployed your Stroom Cluster.

Web Service Integration

One typically ‘fronts’ either a Stroom Proxy or Stroom Application with a secure web service such as Apache’s Httpd or NGINX. In our scenario, we will use SSL to secure the web service and further, we will use Apache’s Httpd.

As we are a cluster, we use the same certificate as the other nodes. Thus we need to gain the certificate package from an existing node.

So, on stroomp00.strmdev00.org, we replicate the directory ~stroomuser/stroom-jks to our new node. That is, tar it up, copy the tar file to stroomp02 and untar it. We can make use of the other node’s mounted file system.

sudo -i -u stroomuser
cd ~stroomuser
tar cf stroom-jks.tar stroom-jks
mv stroom-jks.tar /stroomdata/stroom-data-p02

then on our new node (stroomp02.strmdev00.org) we extract the data.

sudo -i -u stroomuser
cd ~stroomuser
tar xf /stroomdata/stroom-data-p02/stroom-jks.tar && rm -f /stroomdata/stroom-data-p02/stroom-jks.tar

Now ensure protection, ownership and SELinux context for these files by running

chmod 700 ~stroomuser/stroom-jks/private ~stroomuser/stroom-jks
chown -R stroomuser:stroomuser ~stroomuser/stroom-jks
chcon -R --reference /etc/pki ~stroomuser/stroom-jks

This HOWTO is designed to deploy Apache’s httpd web service as a front end (https) (to the user) and Apache’s mod_jk as the interface between Apache and the Stroom tomcat applications. The instructions to configure this can be found here. You should pay particular attention to the section on the Apache Mod_JK configuration as you MUST regenerate the Mod_JK workers.properties file on the existing cluster nodes as well as generating it on our new node.

Other Web service capability can be used, for example, NGINX (external link).

Note that once you have integrated the web services for our new node, you will need to restart the Apache systemd process on the existing two nodes that that the new Mod_JK configuration has taken place.

Installation Validation

We will now check that the installation and web services integration has worked. We do this with a simple firewall check and later perform complete integration tests.

Sanity firewall check

To ensure you have the firewall correctly set up, the following command

sudo firewall-cmd --reload
sudo firewall-cmd --zone=public --list-all

should result in

public (active)
  target: default
  icmp-block-inversion: no
  interfaces: enp0s3
  sources: 
  services: dhcpv6-client http https nfs ssh
  ports: 8009/tcp 9080/tcp 8080/tcp 9009/tcp
  protocols: 
  masquerade: no
  forward-ports: 
  sourceports: 
  icmp-blocks: 
  rich rules: 

Stroom Application Configuration - New Node

We will need to configure this new node’s volumes, set it’s Cluster URL and enable it’s Stream Processors. We do this by logging into the Stroom User Interface (UI) with an account with Administrator privileges. It is recommended you use a attributed user for this activity. Once you have logged in you can configure this new node.

Configure the Volumes for our Stroom deployment

Before we can store data on this new Stroom node we need to configure it’s volumes we have allocated in our Storage hierarchy. The section on adding new volumes in the Volume Maintenance HOWTO shows how to do this.

Configure the Nodes for our Stroom deployment

In a Stroom cluster, nodes are expected to communicate with each other on port 8080 over http. Our installation in a multi node environment ensures the firewall will allow this but we also need to configure the new node. This is achieved via the Stroom UI where we set a Cluster URL for our node. The section on Configuring a new node in the Node Configuration HOWTO demonstrates how do set the Cluster URL.

Data Stream Processing

To enable Stroom to process data, it’s Data Processors need to be enabled. There are NOT enabled by default on installation. The following section in our Stroom Tasks HowTo shows how to do this.

Testing our New Node Installation

To complete the installation process we will test that our new node has successfully integrated into our cluster.

First we need to ensure we have restarted the Apache Httpd service (httpd.service) on the original nodes so that the new workers.properties configuration files take effect.

We now test the node integration by running the tests we use to validate a Multi Node Stroom Cluster Deployment found here noting we should monitor all three nodes proxy and application log files. Basically we are looking to see that this new node participates in the load balancing for the stroomp.strmdev00.org cluster.

Last modified November 26, 2024: Make font paths relative (4afb976)