Class SampleDataForSplitPoints
- java.lang.Object
-
- uk.gov.gchq.gaffer.hdfs.operation.SampleDataForSplitPoints
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,MapReduce
,Operation
public class SampleDataForSplitPoints extends Object implements Operation, MapReduce
TheSampleDataForSplitPoints
operation is for creating a splits file, either for use in aSplitStoreFromFile
operation or anAddElementsFromHdfs
operation. This operation requires an input and output path as well as a path to a file to use as the resultingSplitsFile. For each input file you must also provide aMapperGenerator
class name as part of a pair (input, mapperGeneratorClassName). In order to be generic and deal with any type of input file you also need to provide aJobInitialiser
.JobInitialiser
. NOTE - currently this job has to be run as a hadoop job.- See Also:
SampleDataForSplitPoints.Builder
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SampleDataForSplitPoints.Builder
-
Nested classes/interfaces inherited from interface uk.gov.gchq.gaffer.operation.Operation
Operation.BaseBuilder<OP extends Operation,B extends Operation.BaseBuilder<OP,?>>
-
-
Constructor Summary
Constructors Constructor Description SampleDataForSplitPoints()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description String[]
getCommandLineArgs()
Class<? extends org.apache.hadoop.io.compress.CompressionCodec>
getCompressionCodec()
Map<String,String>
getInputMapperPairs()
JobInitialiser
getJobInitialiser()
A job initialiser allows additional job initialisation to be carried out in addition to that done by the store.Integer
getMaxMapTasks()
Integer
getMaxReduceTasks()
Integer
getMinMapTasks()
Integer
getMinReduceTasks()
Integer
getNumMapTasks()
Integer
getNumSplits()
Map<String,String>
getOptions()
String
getOutputPath()
Class<? extends org.apache.hadoop.mapreduce.Partitioner>
getPartitioner()
float
getProportionToSample()
String
getSplitsFilePath()
boolean
isUseProvidedSplits()
boolean
isValidate()
void
setCommandLineArgs(String[] commandLineArgs)
void
setCompressionCodec(Class<? extends org.apache.hadoop.io.compress.CompressionCodec> compressionCodec)
void
setInputMapperPairs(Map<String,String> inputMapperPairs)
void
setJobInitialiser(JobInitialiser jobInitialiser)
void
setMaxMapTasks(Integer maxMapTasks)
void
setMaxReduceTasks(Integer maxReduceTasks)
void
setMinMapTasks(Integer minMapTasks)
void
setMinReduceTasks(Integer minReduceTasks)
void
setNumMapTasks(Integer numMapTasks)
void
setNumSplits(Integer numSplits)
void
setOptions(Map<String,String> options)
void
setOutputPath(String outputPath)
void
setPartitioner(Class<? extends org.apache.hadoop.mapreduce.Partitioner> partitioner)
void
setProportionToSample(float proportionToSample)
void
setSplitsFilePath(String splitsFilePath)
void
setUseProvidedSplits(boolean useProvidedSplits)
void
setValidate(boolean validate)
SampleDataForSplitPoints
shallowClone()
Operation implementations should ensure a ShallowClone method is implemented.uk.gov.gchq.koryphe.ValidationResult
validate()
Validates an operation.-
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface uk.gov.gchq.gaffer.hdfs.operation.MapReduce
addInputMapperPair, addInputMapperPairs
-
Methods inherited from interface uk.gov.gchq.gaffer.operation.Operation
_getNullOrOptions, addOption, close, containsOption, getOption, getOption, validateRequiredFieldPresent
-
-
-
-
Method Detail
-
validate
public uk.gov.gchq.koryphe.ValidationResult validate()
Description copied from interface:Operation
Validates an operation. This should be used to validate that fields have been be configured correctly. By default no validation is applied. Override this method to implement validation.
-
isValidate
public boolean isValidate()
-
setValidate
public void setValidate(boolean validate)
-
getSplitsFilePath
public String getSplitsFilePath()
- Specified by:
getSplitsFilePath
in interfaceMapReduce
-
setSplitsFilePath
public void setSplitsFilePath(String splitsFilePath)
- Specified by:
setSplitsFilePath
in interfaceMapReduce
-
getNumSplits
public Integer getNumSplits()
-
setNumSplits
public void setNumSplits(Integer numSplits)
-
getProportionToSample
public float getProportionToSample()
-
setProportionToSample
public void setProportionToSample(float proportionToSample)
-
getInputMapperPairs
public Map<String,String> getInputMapperPairs()
- Specified by:
getInputMapperPairs
in interfaceMapReduce
-
setInputMapperPairs
public void setInputMapperPairs(Map<String,String> inputMapperPairs)
- Specified by:
setInputMapperPairs
in interfaceMapReduce
-
getOutputPath
public String getOutputPath()
- Specified by:
getOutputPath
in interfaceMapReduce
-
setOutputPath
public void setOutputPath(String outputPath)
- Specified by:
setOutputPath
in interfaceMapReduce
-
getJobInitialiser
public JobInitialiser getJobInitialiser()
Description copied from interface:MapReduce
A job initialiser allows additional job initialisation to be carried out in addition to that done by the store. Most stores will probably require the Job Input to be configured in this initialiser as this is specific to the type of data store in Hdfs. For Avro data seeAvroJobInitialiser
. For Text data seeTextJobInitialiser
.- Specified by:
getJobInitialiser
in interfaceMapReduce
- Returns:
- the job initialiser
-
setJobInitialiser
public void setJobInitialiser(JobInitialiser jobInitialiser)
- Specified by:
setJobInitialiser
in interfaceMapReduce
-
getNumMapTasks
public Integer getNumMapTasks()
- Specified by:
getNumMapTasks
in interfaceMapReduce
-
setNumMapTasks
public void setNumMapTasks(Integer numMapTasks)
- Specified by:
setNumMapTasks
in interfaceMapReduce
-
getMinMapTasks
public Integer getMinMapTasks()
- Specified by:
getMinMapTasks
in interfaceMapReduce
-
setMinMapTasks
public void setMinMapTasks(Integer minMapTasks)
- Specified by:
setMinMapTasks
in interfaceMapReduce
-
getMaxMapTasks
public Integer getMaxMapTasks()
- Specified by:
getMaxMapTasks
in interfaceMapReduce
-
setMaxMapTasks
public void setMaxMapTasks(Integer maxMapTasks)
- Specified by:
setMaxMapTasks
in interfaceMapReduce
-
getMinReduceTasks
public Integer getMinReduceTasks()
- Specified by:
getMinReduceTasks
in interfaceMapReduce
-
setMinReduceTasks
public void setMinReduceTasks(Integer minReduceTasks)
- Specified by:
setMinReduceTasks
in interfaceMapReduce
-
getMaxReduceTasks
public Integer getMaxReduceTasks()
- Specified by:
getMaxReduceTasks
in interfaceMapReduce
-
setMaxReduceTasks
public void setMaxReduceTasks(Integer maxReduceTasks)
- Specified by:
setMaxReduceTasks
in interfaceMapReduce
-
isUseProvidedSplits
public boolean isUseProvidedSplits()
- Specified by:
isUseProvidedSplits
in interfaceMapReduce
-
setUseProvidedSplits
public void setUseProvidedSplits(boolean useProvidedSplits)
- Specified by:
setUseProvidedSplits
in interfaceMapReduce
-
getPartitioner
public Class<? extends org.apache.hadoop.mapreduce.Partitioner> getPartitioner()
- Specified by:
getPartitioner
in interfaceMapReduce
-
setPartitioner
public void setPartitioner(Class<? extends org.apache.hadoop.mapreduce.Partitioner> partitioner)
- Specified by:
setPartitioner
in interfaceMapReduce
-
getCommandLineArgs
public String[] getCommandLineArgs()
- Specified by:
getCommandLineArgs
in interfaceMapReduce
-
setCommandLineArgs
public void setCommandLineArgs(String[] commandLineArgs)
- Specified by:
setCommandLineArgs
in interfaceMapReduce
-
getCompressionCodec
public Class<? extends org.apache.hadoop.io.compress.CompressionCodec> getCompressionCodec()
-
setCompressionCodec
public void setCompressionCodec(Class<? extends org.apache.hadoop.io.compress.CompressionCodec> compressionCodec)
-
getOptions
public Map<String,String> getOptions()
- Specified by:
getOptions
in interfaceOperation
- Returns:
- the operation options. This may contain store specific options such as authorisation strings or and other properties required for the operation to be executed. Note these options will probably not be interpreted in the same way by every store implementation.
-
setOptions
public void setOptions(Map<String,String> options)
- Specified by:
setOptions
in interfaceOperation
- Parameters:
options
- the operation options. This may contain store specific options such as authorisation strings or and other properties required for the operation to be executed. Note these options will probably not be interpreted in the same way by every store implementation.
-
shallowClone
public SampleDataForSplitPoints shallowClone()
Description copied from interface:Operation
Operation implementations should ensure a ShallowClone method is implemented. Performs a shallow clone. Creates a new instance and copies the fields across. It does not clone the fields. If the operation contains nested operations, these must also be cloned.- Specified by:
shallowClone
in interfaceOperation
- Returns:
- shallow clone
-
-