Class AccumuloStoreRelation
- java.lang.Object
-
- org.apache.spark.sql.sources.BaseRelation
-
- uk.gov.gchq.gaffer.sparkaccumulo.operation.handler.dataframe.AccumuloStoreRelation
-
- All Implemented Interfaces:
org.apache.spark.sql.sources.PrunedFilteredScan,org.apache.spark.sql.sources.PrunedScan,org.apache.spark.sql.sources.TableScan
public class AccumuloStoreRelation extends org.apache.spark.sql.sources.BaseRelation implements org.apache.spark.sql.sources.TableScan, org.apache.spark.sql.sources.PrunedScan, org.apache.spark.sql.sources.PrunedFilteredScanAllows Apache Spark to retrieve data from anAccumuloStoreas aDataFrame. Spark's Java API does not expose theDataFrameclass, but it is just a type alias for aDatasetofRows. The schema of theDataFrameis formed from the schemas of the groups specified in the view.If two of the specified groups have properties with the same name, then the types of those properties must be the same.
AccumuloStoreRelationimplements theTableScaninterface which allows allElements to of the specified groups to be returned to theDataFrame.AccumuloStoreRelationimplements thePrunedScaninterface which allows allElements of the specified groups to be returned to theDataFramebut with only the specified columns returned. Currently,AccumuloStoredoes not allow projection of properties in the tablet server, so this projection is performed within the Spark executors, rather than in Accumulo's tablet servers. OnceAccumuloStoresupports this projection in the tablet servers, then this will become more efficient.AccumuloStoreRelationimplements thePrunedFilteredScaninterface which allows onlyElements that match the the providedFilters to be returned. The majority of these are implemented by adding them to theView, which causes them to be applied on Accumulo's tablet server (i.e. before the data is sent to a Spark executor). If aFilteris specified that specifies either the vertex in anEntityor either the source or destination vertex in anEdgethen this is applied by using the appropriate range scan on Accumulo. Queries against thisDataFramethat do this should be very quick.
-
-
Constructor Summary
Constructors Constructor Description AccumuloStoreRelation(Context context, List<Converter> converters, View view, AccumuloStore store, Map<String,String> options)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>buildScan()Creates aDataFrameof allElements from the specified groups.org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>buildScan(String[] requiredColumns)Creates aDataFrameof allElements from the specified groups with columns that are not required filtered out.org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>buildScan(String[] requiredColumns, org.apache.spark.sql.sources.Filter[] filters)Creates aDataFrameof allElements from the specified groups with columns that are not required filtered out and with (some of) the suppliedFilters applied.org.apache.spark.sql.types.StructTypeschema()org.apache.spark.sql.SQLContextsqlContext()
-
-
-
Method Detail
-
sqlContext
public org.apache.spark.sql.SQLContext sqlContext()
- Specified by:
sqlContextin classorg.apache.spark.sql.sources.BaseRelation
-
schema
public org.apache.spark.sql.types.StructType schema()
- Specified by:
schemain classorg.apache.spark.sql.sources.BaseRelation
-
buildScan
public org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> buildScan()
Creates aDataFrameof allElements from the specified groups.- Specified by:
buildScanin interfaceorg.apache.spark.sql.sources.TableScan- Returns:
- An
RDDofRows containingElements whose group is ingroups.
-
buildScan
public org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> buildScan(String[] requiredColumns)
Creates aDataFrameof allElements from the specified groups with columns that are not required filtered out.Currently this does not push the projection down to the store (i.e. it should be implemented in an iterator, not in the transform). Issue 320 refers to this.
- Specified by:
buildScanin interfaceorg.apache.spark.sql.sources.PrunedScan- Parameters:
requiredColumns- The columns to return.- Returns:
- An
RDDofRows containing the requested columns.
-
buildScan
public org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> buildScan(String[] requiredColumns, org.apache.spark.sql.sources.Filter[] filters)
Creates aDataFrameof allElements from the specified groups with columns that are not required filtered out and with (some of) the suppliedFilters applied.Note that Spark also applies the provided
Filters - applying them here is an optimisation to reduce the amount of data transferred from the store to Spark's executors (this is known as "predicate pushdown").Currently this does not push the projection down to the store (i.e. it should be implemented in an iterator, not in the transform). Issue 320 refers to this.
- Specified by:
buildScanin interfaceorg.apache.spark.sql.sources.PrunedFilteredScan- Parameters:
requiredColumns- The columns to return.filters- TheFilters to apply (these are applied before aggregation).- Returns:
- An
RDDofRows containing the requested columns.
-
-