Class AccumuloStoreRelation
- java.lang.Object
-
- org.apache.spark.sql.sources.BaseRelation
-
- uk.gov.gchq.gaffer.sparkaccumulo.operation.handler.dataframe.AccumuloStoreRelation
-
- All Implemented Interfaces:
org.apache.spark.sql.sources.PrunedFilteredScan
,org.apache.spark.sql.sources.PrunedScan
,org.apache.spark.sql.sources.TableScan
public class AccumuloStoreRelation extends org.apache.spark.sql.sources.BaseRelation implements org.apache.spark.sql.sources.TableScan, org.apache.spark.sql.sources.PrunedScan, org.apache.spark.sql.sources.PrunedFilteredScan
Allows Apache Spark to retrieve data from anAccumuloStore
as aDataFrame
. Spark's Java API does not expose theDataFrame
class, but it is just a type alias for aDataset
ofRow
s. The schema of theDataFrame
is formed from the schemas of the groups specified in the view.If two of the specified groups have properties with the same name, then the types of those properties must be the same.
AccumuloStoreRelation
implements theTableScan
interface which allows allElement
s to of the specified groups to be returned to theDataFrame
.AccumuloStoreRelation
implements thePrunedScan
interface which allows allElement
s of the specified groups to be returned to theDataFrame
but with only the specified columns returned. Currently,AccumuloStore
does not allow projection of properties in the tablet server, so this projection is performed within the Spark executors, rather than in Accumulo's tablet servers. OnceAccumuloStore
supports this projection in the tablet servers, then this will become more efficient.AccumuloStoreRelation
implements thePrunedFilteredScan
interface which allows onlyElement
s that match the the providedFilter
s to be returned. The majority of these are implemented by adding them to theView
, which causes them to be applied on Accumulo's tablet server (i.e. before the data is sent to a Spark executor). If aFilter
is specified that specifies either the vertex in anEntity
or either the source or destination vertex in anEdge
then this is applied by using the appropriate range scan on Accumulo. Queries against thisDataFrame
that do this should be very quick.
-
-
Constructor Summary
Constructors Constructor Description AccumuloStoreRelation(Context context, List<Converter> converters, View view, AccumuloStore store, Map<String,String> options)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>
buildScan()
Creates aDataFrame
of allElement
s from the specified groups.org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>
buildScan(String[] requiredColumns)
Creates aDataFrame
of allElement
s from the specified groups with columns that are not required filtered out.org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>
buildScan(String[] requiredColumns, org.apache.spark.sql.sources.Filter[] filters)
Creates aDataFrame
of allElement
s from the specified groups with columns that are not required filtered out and with (some of) the suppliedFilter
s applied.org.apache.spark.sql.types.StructType
schema()
org.apache.spark.sql.SQLContext
sqlContext()
-
-
-
Method Detail
-
sqlContext
public org.apache.spark.sql.SQLContext sqlContext()
- Specified by:
sqlContext
in classorg.apache.spark.sql.sources.BaseRelation
-
schema
public org.apache.spark.sql.types.StructType schema()
- Specified by:
schema
in classorg.apache.spark.sql.sources.BaseRelation
-
buildScan
public org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> buildScan()
Creates aDataFrame
of allElement
s from the specified groups.- Specified by:
buildScan
in interfaceorg.apache.spark.sql.sources.TableScan
- Returns:
- An
RDD
ofRow
s containingElement
s whose group is ingroups
.
-
buildScan
public org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> buildScan(String[] requiredColumns)
Creates aDataFrame
of allElement
s from the specified groups with columns that are not required filtered out.Currently this does not push the projection down to the store (i.e. it should be implemented in an iterator, not in the transform). Issue 320 refers to this.
- Specified by:
buildScan
in interfaceorg.apache.spark.sql.sources.PrunedScan
- Parameters:
requiredColumns
- The columns to return.- Returns:
- An
RDD
ofRow
s containing the requested columns.
-
buildScan
public org.apache.spark.rdd.RDD<org.apache.spark.sql.Row> buildScan(String[] requiredColumns, org.apache.spark.sql.sources.Filter[] filters)
Creates aDataFrame
of allElement
s from the specified groups with columns that are not required filtered out and with (some of) the suppliedFilter
s applied.Note that Spark also applies the provided
Filter
s - applying them here is an optimisation to reduce the amount of data transferred from the store to Spark's executors (this is known as "predicate pushdown").Currently this does not push the projection down to the store (i.e. it should be implemented in an iterator, not in the transform). Issue 320 refers to this.
- Specified by:
buildScan
in interfaceorg.apache.spark.sql.sources.PrunedFilteredScan
- Parameters:
requiredColumns
- The columns to return.filters
- TheFilter
s to apply (these are applied before aggregation).- Returns:
- An
RDD
ofRow
s containing the requested columns.
-
-