Installation of Stroom Proxy
Assumptions
The following assumptions are used in this document.
- the user has reasonable RHEL/Centos System administration skills.
- installation is on a fully patched minimal Centos 7.3 instance.
- the Stroom database has been created and resides on the host
stroomdb0.strmdev00.org
listening on port 3307. - the Stroom database user is
stroomuser
with a password ofStroompassword1@
. - the application user
stroomuser
has been created. - the user is or has deployed the two node Stroom cluster described here.
- the user has set up the Stroom processing user as described here.
- the prerequisite software has been installed.
- when a screen capture is documented, data entry is identified by the data surrounded by ‘<’ ‘>’ . This excludes enter/return presses.
Confirm Prerequisite Software Installation
The following command will ensure the prerequisite software has been deployed
sudo yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel policycoreutils-python unzip zip
sudo yum -y install mariadb
or
sudo yum -y install mysql-community-client
Note that we do NOT need the database client software for a Forwarding or Standalone proxy.
Get the Software
The following will gain the identified, in this case release 5.1-beta.10
, Stroom Application software release from github, then deploy it. You should regularly monitor the site for newer releases.
sudo -i -u stroomuser
Prx=v5.1-beta.10
wget https://github.com/gchq/stroom-proxy/releases/download/${Prx}/stroom-proxy-distribution-${Prx}.zip
unzip stroom-proxy-distribution-${Prx}.zip
Configure the Software
There are three different types of Stroom Proxy
- Store
A store proxy accepts batches of events, as files. It will validate the batch with the database then store the batches as files in a configured directory.
- Store_NoDB
A store_nodb proxy accepts batches of events, as files. It has no connectivity to the database, so it assumes all batches are valid, so it stores the batches as files in a configured directory.
- Forwarding
A forwarding proxy accepts batches of events, as files. It has indirect connectivity to the database via the destination proxy, so it validates the batches then stores the batches as files in a configured directory until they are periodically forwarded to the configured destination Stroom proxy.
We will demonstrate the installation of each.
Store Proxy Configuration
In our Store Proxy description below, we will use the multi node deployment scenario. That is we are deploying the Store proxy on multiple Stroom nodes (stroomp00, stroomp01) and we have configured our storage as per the Storage Scenario which means the directories to install the inbound batches of data are /stroomdata/stroom-working-p00/proxy
and /stroomdata/stroom-working-p01/proxy
depending on the node.
To install a Store proxy, we run
stroom-proxy/bin/setup.sh store
during which one is prompted for a number of configuration settings. Use the following
NODE to be the hostname (not FQDN) of your host (i.e. 'stroomp00' or 'stroomp01' depending on the node we are installing on)
PORT_PREFIX should use the default, just press return
REPO_DIR should be set to '/stroomdata/stroom-working-p00/proxy' or '/stroomdata/stroom-working-p01/proxy' depending on the node we are installing on
REPO_FORMAT can be left as the default, just press return
JDBC_CLASSNAME should use the default, just press return
JDBC_URL should be set to 'jdbc:mysql://stroomdb0.strmdev00.org:3307/stroom'
DB_USERNAME should be our processing user, 'stroomuser'
DB_PASSWORD should be the one we set when creating the stroom database, that is 'Stroompassword1@'
JAVA_OPTS can use the defaults, but ensure you have sufficient memory, either change or accept the default
At this point, the script will configure the proxy. There should be no errors, but review the output. If you make a mistake in the above, just re-run the script.
NOTE: The selection of the REPO_DIR
above and the setting of the STROOM_TMP
environment variable earlier ensure that not only inbound files are placed in the REPO_DIR
location but the Stroom Application itself will access the same directory when it aggregates inbound data for ingest in it’s proxy aggregation threads.
Forwarding Proxy Configuration
In our Forwarding Proxy description below, we will deploy on a host named stroomfp0
and it will store the files in /stroomdata/stroom-working-fp0/proxy
. Remember, we are being consistent with our Storage hierarchy to make documentation and scripting simpler. Our destination host to periodically forward the files to will be stroomp.strmdev00.org
(the CNAME for stroomp00.strmdev00.org
).
To install a Forwarding proxy, we run
stroom-proxy/bin/setup.sh forward
during which one is prompted for a number of configuration settings. Use the following
NODE to be the hostname (not FQDN) of your host (i.e. 'stroomfp0' in our example)
PORT_PREFIX should use the default, just press return
REPO_DIR should be set to '/stroomdata/stroom-working-fp0/proxy' which we created earlier.
REPO_FORMAT can be left as the default, just press return
FORWARD_SERVER should be set to our stroom server. (i.e. 'stroomp.strmdev00.org' in our example)
JAVA_OPTS can use the defaults, but ensure you have sufficient memory, either change or accept the default
At this point, the script will configure the proxy. There should be no errors, but review the output.
Store No Database Proxy Configuration
In our Store_NoDB Proxy description below, we will deploy on a host named stroomsap0
and it will store the files in /stroomdata/stroom-working-sap0/proxy
. Remember, we are being consistent with our Storage hierarchy to make documentation and scripting simpler.
To install a Store_NoDB proxy, we run
stroom-proxy/bin/setup.sh store_nodb
during which one is prompted for a number of configuration settings. Use the following
NODE to be the hostname (not FQDN) of your host (i.e. 'stroomsap0' in our example)
PORT_PREFIX should use the default, just press return
REPO_DIR should be set to '/stroomdata/stroom-working-sap0/proxy' which we created earlier.
REPO_FORMAT can be left as the default, just press return
JAVA_OPTS can use the defaults, but ensure you have sufficient memory, either change or accept the default
At this point, the script will configure the proxy. There should be no errors, but review the output.
Apache/Mod_JK change
For all proxy deployments, if we are using Apache’s mod_jk then we need to ensure the proxy’s AJP connector specifies a 64K packetSize. View the file stroom-proxy/instance/conf/server.xml
to ensure the Connector element for the AJP protocol has a packetSize attribute of 65536
. For example,
grep AJP stroom-proxy/instance/conf/server.xml
shows
<Connector port="9009" protocol="AJP/1.3" connectionTimeout="20000" redirectPort="8443" maxThreads="200" packetSize="65536" />
This check is required for earlier releases of the Stroom Proxy. Releases since v5.1-beta.4
have set the AJP packetSize.
Start the Proxy Service
We can now manually start our proxy service. Do so as the stroomuser
with the command
stroom-proxy/bin/start.sh
Now monitor the directory stroom-proxy/instance/logs
for any errors. Initially you will see the log files localhost_access_log.YYYY-MM-DD.txt
and catalina.out
. Check them for errors and correct (or pose a question to this arena).
The context path and unknown version warnings in catalina.out
can be ignored.
Eventually (about 60 seconds) the log file stroom-proxy/instance/logs/stroom.log
will appear. Again check it for errors.
The proxy will have completely started when you see the messages
INFO [localhost-startStop-1] spring.StroomBeanLifeCycleReloadableContextBeanProcessor (StroomBeanLifeCycleReloadableContextBeanProcessor.java:109) - ** proxyContext 0 START COMPLETE **
and
INFO [localhost-startStop-1] spring.StroomBeanLifeCycleReloadableContextBeanProcessor (StroomBeanLifeCycleReloadableContextBeanProcessor.java:109) - ** webContext 0 START COMPLETE **
If you leave it for a while you will eventually see cyclic (10 minute cycle) messages of the form
INFO [Repository Reader Thread 1] repo.ProxyRepositoryReader (ProxyRepositoryReader.java:170) - run() - Cron Match at YYYY-MM-DD ...
If a proxy takes too long to start, you should read the section on Entropy Issues.
Proxy Repository Format
A Stroom Proxy stores inbound files in a hierarchical file system whose root is supplied during the proxy setup (REPO_DIR
) and as files arrive they are given a repository id that is a one-up number starting at one (1). The files are stored in a specific repository format.
The default template is ${pathId}/${id}
and this pattern will produce the following output files under REPO_DIR
for the given repository id
Repository Id | FilePath |
---|---|
1 | 000.zip |
100 | 100.zip |
1000 | 001/001000.zip |
10000 | 010/010000.zip |
100000 | 100/100000.zip |
Since version v5.1-beta.4, this template can be specified during proxy setup via the entry to the Stroom Proxy Repository Format
prompt
...
@@REPO_FORMAT@@ : Stroom Proxy Repository Format [${pathId}/${id}] >
...
The template uses replacement variables to form the file path. As indicated above, the default template is ${pathId}/${id}
where ${pathId}
is the automatically generated directory for a given repository id and ${id}
is the repository id.
Other replacement variables can be used to in the template including http header meta data parameters (e.g. ‘${feed}’) and time based parameters (e.g. ‘${year}’). Replacement variables that cannot be resolved will be output as ‘_’. You must ensure that all templates include the ‘${id}’ replacement variable at the start of the file name, failure to do this will result in an invalid repository.
Available time based parameters are based on the file’s time of processing and are zero filled (excluding ms
).
Parameter | Description |
---|---|
year | four digit year |
month | two digit month |
day | two digit day |
hour | two digit hour |
minute | two digit minute |
second | two digit second |
millis | three digit milliseconds value |
ms | milliseconds since Epoch value |
Proxy Repository Template Examples
For each of the following templates applied to a Store NoDB Proxy, the resultant proxy directory tree is shown after three posts were sent to the test feed TEST-FEED-V1_0
and two posts to the test feed FEED-NOVALUE-V9_0
Example A - The default - ${pathId}/${id}
[stroomuser@stroomsap0 ~]$ find /stroomdata/stroom-working-sap0/proxy/
/stroomdata/stroom-working-sap0/proxy/
/stroomdata/stroom-working-sap0/proxy/001.zip
/stroomdata/stroom-working-sap0/proxy/002.zip
/stroomdata/stroom-working-sap0/proxy/003.zip
/stroomdata/stroom-working-sap0/proxy/004.zip
/stroomdata/stroom-working-sap0/proxy/005.zip
[stroomuser@stroomsap0 ~]$
Example B - A feed orientated structure - ${feed}/${year}/${month}/${day}/${pathId}/${id}
[stroomuser@stroomsap0 ~]$ find /stroomdata/stroom-working-sap0/proxy/
/stroomdata/stroom-working-sap0/proxy/
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/2017
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/2017/07
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/2017/07/23
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/2017/07/23/001.zip
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/2017/07/23/002.zip
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/2017/07/23/003.zip
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0/2017
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0/2017/07
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0/2017/07/23
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0/2017/07/23/004.zip
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0/2017/07/23/005.zip
[stroomuser@stroomsap0 ~]$
Example C - A date orientated structure - ${year}/${month}/${day}/${pathId}/${id}
[stroomuser@stroomsap0 ~]$ find /stroomdata/stroom-working-sap0/proxy/
/stroomdata/stroom-working-sap0/proxy/
/stroomdata/stroom-working-sap0/proxy/2017
/stroomdata/stroom-working-sap0/proxy/2017/07
/stroomdata/stroom-working-sap0/proxy/2017/07/23
/stroomdata/stroom-working-sap0/proxy/2017/07/23/001.zip
/stroomdata/stroom-working-sap0/proxy/2017/07/23/002.zip
/stroomdata/stroom-working-sap0/proxy/2017/07/23/003.zip
/stroomdata/stroom-working-sap0/proxy/2017/07/23/004.zip
/stroomdata/stroom-working-sap0/proxy/2017/07/23/005.zip
[stroomuser@stroomsap0 ~]$
Example D - A feed orientated structure, but with a bad parameter - ${feed}/${badparam}/${day}/${pathId}/${id}
[stroomuser@stroomsap0 ~]$ find /stroomdata/stroom-working-sap0/proxy/
/stroomdata/stroom-working-sap0/proxy/
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/_
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/_/23
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/_/23/001.zip
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/_/23/002.zip
/stroomdata/stroom-working-sap0/proxy/TEST-FEED-V1_0/_/23/003.zip
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0/_
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0/_/23
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0/_/23/004.zip
/stroomdata/stroom-working-sap0/proxy/FEED-NOVALUE-V9_0/_/23/005.zip
[stroomuser@stroomsap0 ~]$
and one would also see a warning for each post in the proxy’s log file of the form
WARN [ajp-apr-9009-exec-4] repo.StroomFileNameUtil (StroomFileNameUtil.java:133) - Unused variables found: [badparam]