Skip to content

HDFS Operations

These Operations are used with HDFS.

AddElementsFromHdfs

Operation for adding Elements from HDFS. Javadoc

This operation must be run as a Hadoop Job. So you will need to package up a shaded jar containing a main method that creates an instance of Graph and executes the operation. It can then be run with:

hadoop jar custom-shaded-jar.jar

When running an AddElementsFromHdfs on Accumulo, if you do not specify 'useProvidedSplits' and the Accumulo table does not have a full set of split points then this operation will first sample the input data, generate split points and set them on the Accumulo table. It does this by delegating to 'SampleDataForSplitPoints' and class uk.gov.gchq.gaffer.operation.impl.SplitStoreFromFile.

Specifying the number of reducers within the Job has now been deprecated, and instead it is preferred to set the minimum and/or maximum number of reducers. Most users should not need to set the min or max number of reducers and simply leave the Store to pick the optimal number. The Accumulo Store does this by using the number of tablet servers. If you choose to set a min or max number of reducers then the Store will try to use a number within that range. If there is no optimal number within the provided range an exception is thrown.

Example adding elements from HDFS - Main Method

Example content for a main method that takes 5 arguments and runs an AddElementsFromHdfs.

if (5 != args.length) {
    System.err.println("Usage: hadoop jar custom-hdfs-import-<version>-shaded.jar <inputPath> <outputPath> <failurePath> <schemaPath> <storePropertiesPath>");
    System.exit(1);
}

final String inputPath = args[0];
final String outputPath = args[1];
final String failurePath = args[2];
final String schemaPath = args[3];
final String storePropertiesPath = args[4];

final Graph graph = new Graph.Builder()
        .storeProperties(storePropertiesPath)
        .addSchemas(Paths.get(schemaPath))
        .build();

final AddElementsFromHdfs operation = new AddElementsFromHdfs.Builder()
        .addInputMapperPair(inputPath, TextMapperGeneratorImpl.class.getName())
        .outputPath(outputPath)
        .failurePath(failurePath)
        .splitsFilePath("/tmp/splits")
        .workingPath("/tmp/workingDir")
        .useProvidedSplits(false)
        .jobInitialiser(new TextJobInitialiser())
        .minReducers(10)
        .maxReducers(100)
        .build();

graph.execute(operation, new User());

Last update: May 18, 2023
Created: May 18, 2023